[RFC]: Design a shared warmup infrastructure for JITs

Motivation.

#47451 contains an initial implementation of the proposal described in this RFC.

Summary

This RFC proposes a standard, extensible warmup infrastructure for JIT kernels in vLLM.

The goal is to let kernels from different JIT backends, including Triton, CuTeDSL, TileLang, and potential future DSLs, expose the set of specializations that should be compiled during engine startup.

This is not intended to be a one-off warmup path for a specific kernel. Instead, it defines a kernel-owned contract where each warmable kernel describes its own compile keys, dispatch logic, warmup search space, and compile-only entry point.

Motivation

vLLM increasingly relies on JIT-generated kernels from multiple DSLs. Today, warming these kernels is difficult to standardize because each backend exposes different runtime and compilation APIs. This RFC proposes a common infrastructure with several goals:

Create shared JIT warmup infrastructure for JIT backends in vLLM.
Keep warmup definitions close to the kernels that own the specialization logic, making the system easier to review and maintain.
Warm up actual compile keys instead of running representative non-key inputs, such as token sizes, and hoping they map to all required specializations. That mapping is not always obvious or guaranteed, so warmup should target the compile-key space directly.
Use Python AST dispatch tracing to derive compile-key search spaces from normal Python dispatch(...) methods, avoiding duplicated hand-written warmup logic.
Support compile-only warmup APIs, avoiding dummy runtime launches and dummy tensor allocation. Dummy runs can be expensive and may have side effects; each DSL should expose fake tensor/spec descriptors suitable for compilation only.
Define a standard interface for new contributors and potentially third-party libraries to expose warmup metadata: compile keys, representative warmup keys, and a compile-only API.

Proposed Change.

Each warmable kernel should expose a small wrapper object near the kernel's normal runtime entry point.

The wrapper owns:

A frozen CompileKey dataclass with the fields that identify one compiled specialization.
A dispatch(...) method that maps normal dispatch arguments to CompileKey.
A get_warmup_keys(...) method that returns representative keys to compile.
A compile(compile_key) method that compiles one key.

Proposed shape:

class MyKernel(VllmJitKernel["MyKernel.CompileKey"]):

    @dataclass(frozen=True)
    class CompileKey:
        BLOCK_SIZE: int

    def dispatch(self, *, num_tokens: int) -> CompileKey:
        return self.CompileKey(BLOCK_SIZE= ...)

    def get_warmup_keys(self, vllm_config: VllmConfig) -> list[CompileKey]:
        ...

    def compile(self, compile_key: CompileKey) -> None:
        ...


MY_KERNEL = MyKernel()

CompileKey must be hashable. get_warmup_keys(...) should deduplicate keys before returning them if multiple representative inputs map to the same compiled specialization.

Scope

This RFC covers the warmup contract and the minimal shared infrastructure needed to make JIT warmup kernel-owned and backend-extensible.

The initial scope is limited to:

A generic wrapper interface for warmable kernels.
AST-assisted expansion of representative dispatch inputs into compile keys.
Backend-specific compile-only adapters where needed.
Initial Triton and CuTeDSL examples that exercise the contract.

It does not attempt to migrate every existing warmup path or define a complete backend-neutral fake tensor API for all DSLs in this first step.

Current And Prior Work

To my knowledge, PR #47451 is the first implementation of this proposal. It adds the shared wrapper contract and demonstrates it with one Triton path and one CuTeDSL path.

Before this proposal, warmup logic was mostly backend-specific and ad hoc. In some cases it used representative runtime-like inputs rather than directly expressing the compile-key space.

Risks

Some specialization fields (compile-keys) may be hidden in backend internals and hard to model.
Over-warming can increase startup time.
Third-party libraries may need small API additions to expose compile-only entry points, and potentially methods to get warmup compile keys.

Feedback Period.

No response

CC List.

No response

Any Other Things.

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC]: Design a shared warmup infrastructure for JITs #47456

Motivation.

Summary

Motivation

Proposed Change.

Scope

Current And Prior Work

Risks

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

[RFC]: Design a shared warmup infrastructure for JITs #47456

Description

Motivation.

Summary

Motivation

Proposed Change.

Scope

Current And Prior Work

Risks

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions