[ZenCPU] Add zencpu Platform Runtime Logging and Docs by amd-lalithnc · Pull Request #42726 · vllm-project/vllm

amd-lalithnc · 2026-05-15T09:01:34Z

Change-Id: I6049a7950a6c312c02892a740576411e886271a7

Purpose

Add AMD Zen CPU runtime visibility to the public fork so it is easy to confirm, from live logs, that:

ZenCpuPlatform was activated
CPU unquantized GEMM dispatch went through the zentorch path

This PR also updates unit tests - to verify the ZenCPU platform verification logs - and installation/verification docs for the Zen CPU flow.

Test Plan

Run the focused unit tests for the new logging coverage:

python -m pytest tests/model_executor/test_cpu_unquantized_gemm_dispatch.py -q

Install the public fork in the validation env and run a real CPU throughput benchmark:
python -m pip install . --no-build-isolation --no-deps --force-reinstall

env -i HOME="$HOME" PATH="$PATH" TERM="${TERM:-xterm}" USER="${USER}" bash --noprofile --norc -lc '
source /proj/zendnn/lalithnc/anaconda3/etc/profile.d/conda.sh &&
conda activate vllm-zentorch-0.20.2 &&
cd /tmp &&
export VLLM_CPU_KVCACHE_SPACE=40 &&
export VLLM_CPU_OMP_THREADS_BIND="0-55" &&
vllm bench throughput \
  --model /proj/ai_models/vllm/hub/models--meta-llama--Llama-3.1-8B-Instruct/snapshots/0e9e39f249a16976918f6564b8830bc894c89659 \
  --random-input-len 128 \
  --random-output-len 128 \
  --num-prompts 8 \
  --max-num-seqs 8

Verify the runtime output contains:

ZenCpuPlatform activated
CPU unquantized GEMM dispatch: using zentorch_linear_unary

Test Result

Focused unit tests passed:

4 passed, 17 warnings in 1.91s

Real benchmark on the public fork emitted the expected runtime logs:

ZenCpuPlatform activated | zentorch=5.2.1 | VLLM_ZENTORCH_WEIGHT_PREPACK=1 | AVX-512=True | AVX-512_BF16=True
CPU unquantized GEMM dispatch: using zentorch_linear_unary (prepacked=True)

Benchmark completed successfully with:

Throughput: 0.65 requests/s, 167.36 total tokens/s, 83.68 output tokens/s
Total num prompt tokens:  1024
Total num output tokens:  1024
exit_code: 0

Docs were updated to reflect Zen CPU install/verification flow in the public fork.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

cc: @tlrmchlsmth @ProExpertProg

Make it easy to confirm platform activation and GEMM dispatch during real runs. Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I6049a7950a6c312c02892a740576411e886271a7

Make the CPU hardware note explicit that AMD Zen keeps the same vLLM model support. Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I5b4fcaecc344bab7daf3174d1c90b3d674819fcd

mergify · 2026-05-15T09:02:56Z

Documentation preview: https://vllm--42726.org.readthedocs.build/en/42726/

gemini-code-assist

Code Review

This pull request introduces documentation and logging for AMD Zen CPU optimizations. It adds detailed installation instructions for the zentorch plugin, explains the auto-activation of ZenCpuPlatform on AMD Zen 4/5 hardware, and implements informative logging to identify which GEMM dispatch kernel is being utilized. Unit tests were also included to verify the new logging and activation logic. I have no feedback to provide.

mergify · 2026-05-15T09:03:27Z

Hi @amd-lalithnc, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Apply the formatter changes and fix the CPU include-file anchor so the branch-local pre-commit checks pass. Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I877af675bb19cd8e606e227490ffd0600f46e6f5

amd-lalithnc · 2026-05-27T06:38:46Z

hello @tlrmchlsmth @ProExpertProg - can we enable CI checks on this PR? please do give this a review when you get a chance, these logs help us identify if the relevant platform and kernels are being selected. Thanks!

amukho · 2026-05-28T04:00:36Z

hi @AndreasKaratzas can you please review and help merge this PR? This is adding the logging and documentation support for the ZenCPU platform plugin already upstreamed in vLLM

AndreasKaratzas · 2026-05-28T04:06:11Z

+### How do I enable AMD Zen optimizations?
+
+On an AMD Zen 4 / Zen 5 CPU, install the CPU wheel with the `zen` extra so
+vLLM pulls the tested `zentorch` version for that release:
+
+```bash
+export VLLM_VERSION=0.20.2
+pip install "vllm[zen] @ https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}%2Bcpu-cp38-abi3-manylinux_2_35_x86_64.whl" \
+    --extra-index-url https://download.pytorch.org/whl/cpu
+```
+
+This is safer than installing `zentorch` separately, because the `zen` extra
+keeps the vLLM wheel and ZenDNN plugin on the tested version combination.
+
+vLLM auto-detects the platform and routes linear layers through
+ZenDNN-optimized kernels - no flag needed. To verify it is engaged, run with
+INFO-level logs and look for the activation banner:
+
+```bash
+VLLM_LOGGING_LEVEL=INFO vllm serve facebook/opt-125m --dtype bfloat16 \
+    2>&1 | grep -E "ZenCpuPlatform activated|CPU unquantized GEMM dispatch"
+```
+
+See [AMD Zen optimizations](#amd-zen-optimizations) for details on detection
+rules, supported dtypes, and the `VLLM_ZENTORCH_WEIGHT_PREPACK` knob.
+


I would put this above the "How to decide VLLM_CPU_OMP_THREADS_BIND"

AndreasKaratzas · 2026-05-28T04:06:48Z

+vLLM supports basic model inferencing and serving on x86 CPU platform, with
+data types FP32 and BF16.


No reason for line diff here.

AndreasKaratzas · 2026-05-28T04:07:06Z

+On AMD Zen 4 / Zen 5 (Genoa, Bergamo, Turin) CPUs, vLLM auto-activates a
+`ZenCpuPlatform` when a compatible
+[`zentorch`](https://github.com/amd/ZenDNN-pytorch-plugin) package is
+available. For release wheels, install the CPU wheel with the `zen` extra so
+vLLM pulls the tested `zentorch` version for that release. See
+[AMD Zen optimizations](cpu.md#amd-zen-optimizations) below.


No reason for new lines here.

AndreasKaratzas · 2026-05-28T04:08:54Z

 --8<-- [end:build-image-from-source]
+--8<-- [start:amd-zen-optimizations]
+
+On AMD Zen CPUs, vLLM auto-selects `ZenCpuPlatform` (a subclass of the default


Again check this file, no need for some of the new lines.

AndreasKaratzas · 2026-05-28T04:09:26Z

+@pytest.mark.usefixtures("_mock_zentorch_linear_unary")
+def test_dispatch_cpu_unquantized_gemm_logs_zentorch_dispatch(monkeypatch):


Should this be skipped if platform is not zen_cpu?

No, this is intentional. The test is a unit test for the dispatcher logic.

It uses the _mock_zentorch_linear_unary fixture to register a fake zentorch_linear_unary op via torch.library and monkeypatches current_platform.is_zen_cpu to True. This makes the test hardware-independent by design so the dispatch path is exercised on any CI runner.

Skipping on non-Zen hardware would mean the new logging code is never validated in CI. The pattern matches the two existing tests above it (test_dispatch_cpu_unquantized_gemm_uses_zentorch_on_zen, test_dispatch_cpu_unquantized_gemm_zen_remove_weight) that were accepted in PR #35970, and the four test_zen_cpu_platform_detection.py tests (also from #35970) that mock os.path.exists + open to make /proc/cpuinfo reads hardware-independent. Verified passing locally: 4/4 in this file + 4/4 in test_zen_cpu_platform_detection.py = 8/8 PASSED.

although, would love to get your thoughts on adding UT for logging

AndreasKaratzas · 2026-05-28T04:09:39Z

+    ]
+
+
+def test_zen_cpu_platform_logs_activation(monkeypatch):


same answer as the previous one

AndreasKaratzas · 2026-05-28T04:12:11Z

@@ -282,6 +289,7 @@ def dispatch_cpu_unquantized_gemm(
            layer.cpu_linear = lambda x, weight, bias: ops.onednn_mm(handler, x, bias)
            if remove_weight:
                layer.weight = torch.nn.Parameter(torch.empty(0), requires_grad=False)
+            logger.info_once("CPU unquantized GEMM dispatch: using oneDNN onednn_mm")
            return
        except RuntimeError as e:
            logger.warning_once(
@@ -293,6 +301,9 @@ def dispatch_cpu_unquantized_gemm(
    layer.cpu_linear = lambda x, weight, bias: torch.nn.functional.linear(
        x, weight, bias
    )
+    logger.info_once(
+        "CPU unquantized GEMM dispatch: using torch.nn.functional.linear (fallback)"
+    )


The first log is borderline OK since it happens on the zentorch path alone, but the rest of them I am not so sure. Can you explore the option of installing logs in your dispatcher? (platforms/zen_cpu.py)

Or these are important logs?

Dropped the three non-Zen logger.info_once calls in dispatch_cpu_unquantized_gemm (sgl-kernel, oneDNN, fallback)

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I3b246f6f5ac1d658a8962a686ae569467f803b6a

tlrmchlsmth · 2026-05-28T13:19:17Z

+    @classmethod
+    def check_and_update_config(cls, vllm_config: VllmConfig) -> None:
+        super().check_and_update_config(vllm_config)
+
+        import zentorch
+
+        zentorch_version = getattr(zentorch, "__version__", "unknown")
+        avx512 = torch.cpu._is_avx512_supported()
+        avx512_bf16 = torch.cpu._is_avx512_bf16_supported()
+
+        logger.info_once(
+            "ZenCpuPlatform activated | zentorch=%s | "
+            "VLLM_ZENTORCH_WEIGHT_PREPACK=%d | "
+            "AVX-512=%s | AVX-512_BF16=%s",
+            zentorch_version,
+            int(envs.VLLM_ZENTORCH_WEIGHT_PREPACK),
+            avx512,
+            avx512_bf16,
+        )
+
+        zentorch_config = getattr(zentorch, "__config__", None)
+        if zentorch_config:
+            logger.info_once("zentorch build config: %s", zentorch_config)


check_and_update_config is not the right place to print info banners. We already have logging that the zen platform is being used here - perhaps upgrade that logging level to info instead?

vllm/vllm/platforms/__init__.py

Lines 190 to 192 in 02606b0

logger.debug(

"AMD Zen CPU detected with zentorch installed, using ZenCpuPlatform."

)

tlrmchlsmth · 2026-05-28T13:25:25Z

+        logger.info_once(
+            "CPU unquantized GEMM dispatch: using zentorch_linear_unary (prepacked=%s)",
+            is_prepacked,
+        )


Could you change this to debug_once and then add a log for all of the backends, not just zentorch? Thanks

AndreasKaratzas · 2026-05-29T05:42:15Z

Gonna add the ready label, comments have been addressed
cc @tlrmchlsmth for final stamping

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I7d16e0f80048b0893e157b3c13d27bf2b38cdc78

mergify · 2026-05-29T05:47:47Z

Hi @amd-lalithnc, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

mergify · 2026-05-29T05:53:52Z

Hi @amd-lalithnc, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: Ie92660278bda66af71e575da506c09c01f40322f

amd-lalithnc · 2026-05-29T06:53:37Z

hi @tlrmchlsmth @AndreasKaratzas - CI failing due to a 503 error, can we retrigger?

tlrmchlsmth

A couple comments in the readme that I didn't catch before

tlrmchlsmth · 2026-06-01T20:39:00Z

+VLLM_LOGGING_LEVEL=INFO vllm serve facebook/opt-125m --dtype bfloat16 \
+    2>&1 | grep "AMD Zen CPU detected with zentorch installed"


Let's update the model to something more modern. (Qwen3 0.6b? In that case you won't have to set --dtype bf16) We also shouldn't need to set VLLM_LOGGING_LEVEL=INFO since this is the default

changed the model

- Pin VLLM_VERSION to the latest release tag via the GitHub API instead of hard-coding a stale version. - Drop the %2B URL-encoding now that the literal '+' works. - Trim the verification command: vllm serve already logs at INFO and picks dtype automatically, and switch the example model to a more current one (Qwen/Qwen3-0.6B). Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I7013670bbe7ef2a02f764a0ef5ede5dfc98b9f1a

amd-lalithnc · 2026-06-08T12:28:57Z

hi @tlrmchlsmth - can we go ahead and merge this PR - all review comments have been addressed, thanks!

tlrmchlsmth

accident

Use `uv pip install "vllm[zen]"` with the wheels.vllm.ai index instead of constructing a direct wheel URL, consistent with the nightly/commit install patterns in the x86 docs. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Tyler Michael Smith <tyler@tylersmith.us> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>

amd-lalithnc · 2026-06-13T14:02:09Z

hi @tlrmchlsmth - looks like the CI failures are unrelated to our PR - please check and let me know if there are any concerns

…2726) Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: vivek sharma <vivsharm@redhat.com>

…2726) Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

…2726) Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>

…2726) Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: lcheng <lcheng321@gatech.edu>

* [Kernel][Helion][1/N] Add Helion kernel for per_token_group_fp8_quant (#36902) Signed-off-by: Sean Chen <seachen@redhat.com> Co-authored-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [Bugfix] Restrict FlashInfer cuDNN FP8 ViT attention gate to Blackwell (SM 100) (#45251) Signed-off-by: Wentian Byte <3400259131@qq.com> * [Rust Frontend] Support continuous_usage_stats stream option (#43965) Co-authored-by: Bugen Zhao <i@bugenzhao.com> Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com> Signed-off-by: Bugen Zhao <i@bugenzhao.com> * [Bugfix] Fix Anthropic tool_use content handling dropping args (#45287) Signed-off-by: Ben Browning <bbrownin@redhat.com> * [Model] Remove InternLMForCausalLM registry alias (#45128) Signed-off-by: Xianbao QIAN <xianbao.qian@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> * [Bug] Fix test flashmla for DSv4 (#45052) Signed-off-by: yewentao256 <zhyanwentao@126.com> * [Refactor] Chat Completions Harmony Refactor, non-streaming path. (#45171) Signed-off-by: Yifan Zong <yzong@redhat.com> * [Bugfix][KVConnector][Mooncake] Close MooncakeDistributedStore on connector teardown (#45206) Signed-off-by: Dao Le <Dao007forever@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> * Make mistral_common optional by deferring MistralToolCall import (#45305) Signed-off-by: Neil Schemenauer <nas@arctrix.com> * [Bugfix] Initialize missing attributes in mistral eagle (#45217) Signed-off-by: jpwang <jpwang@smail.nju.edu.cn> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [Refactor] Chat Completions Streaming Harmony Refactor and Bugfixes (#45104) Signed-off-by: Yifan Zong <yzong@redhat.com> * [Bugfix] OffloadingConnector: respect skip_reading_prefix_cache flag (#44592) Signed-off-by: Hsiao-Yuan Chen <hy.c@Hsiao-YuandeMacBook-Pro.local> Signed-off-by: littlecircle0730 <littlecircle0730@gmail.com> Signed-off-by: littlecircle0730 <43994952+littlecircle0730@users.noreply.github.com> Co-authored-by: Hsiao-Yuan Chen <hy.c@Hsiao-YuandeMacBook-Pro.local> Co-authored-by: Or Ozeri <or@ozery.com> * [ROCm][DSv4][Perf] Flash-decode split-K decode attention kernel (#44899) Co-authored-by: vLLM Contributor <contributor@vllm.ai> * [Bugfix][Model] Pass revision by name in Run:ai and bitsandbytes index downloads (#45308) Signed-off-by: Ting Sun <suntcrick@gmail.com> * [CI][BugFix] Fix broken `test_mamba_prefix_cache.py` due to stale mock (#45345) Signed-off-by: Nick Hill <nickhill123@gmail.com> * [Bugfix] Fix --enable-prompt-tokens-details omitting zero cached tokens (#44383) Signed-off-by: Sasindharan Sankar <sasindharansankar@email.com> Co-authored-by: Sasindharan Sankar <sasindharansankar@email.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> * [ASR] Optimize CPU preproc to get 2.5x RTFx via multi-threading (#44612) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [Bugfix] Mamba CPU Offloading (#44599) Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com> Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com> * [ASR] Add Long Audio benchmark and correctness test (#44587) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> * [11a/n] Migrate Marlin kernels to torch stable ABI (#45176) Signed-off-by: Chris Leonard <chleonar@redhat.com> * [NIXL] Per-region KV transfer classification for mixed full-attn + MLA groups (#44583) * [ROCm][CI] fix fp8 support for test_deepep_moe (#45302) Signed-off-by: Divakar Verma <divakar.verma@amd.com> * [Model] Add DiffusionGemma Support (#45163) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Martin Kukla <martin.kukla@cantab.net> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Dipika Sikka <dsikka@redhat.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Alec Kohlhoff <134344302+aleckohlhoff@users.noreply.github.com> Co-authored-by: Porras Huang <20535584+porrashuang@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: scoootscooob <167050519+scoootscooob@users.noreply.github.com> * [MM][Perf][CG] Support ViT full cudagraphs for mllama4 (#40660) Signed-off-by: allgather <all2allops@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> * [ROCm][gpt-oss] Pass GateMode.INTERLEAVE for MXFP4 W4A16 fused MoE (#44893) Signed-off-by: Rohan Potdar <rohan.potdar@amd.com> Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> * [Bugfix] Fix Dockerfile dependency graph pre-commit error (#45374) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> * [CPU] Support CPU W4A16 INT4 MoE (#43409) Signed-off-by: yuwenzho <yuwen.zhou@intel.com> * [Rust Frontend][Bugfix] Forward --shutdown-timeout and --disable-log-stats to the managed Python engine (#45300) Signed-off-by: Will Eaton <weaton@redhat.com> * [XPU][DeepSeek-V4] Fix MTP: sync with upstream fixes #44821 and #43746 (#45240) Signed-off-by: Ma Jian <jian1.ma@intel.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [CI] ci-fetch-log.sh: fetch all failed jobs from a build URL or PR number (#45274) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Claude Fable 5 <noreply@anthropic.com> * [Frontend] Support strict mode for tool calling (#45003) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: cjackal <44624812+cjackal@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [Bugfix][Rust Frontend] Return 400 for prompt-validation submit errors (#45286) Signed-off-by: xiaguan <751080330@qq.com> Co-authored-by: Claude Fable 5 <noreply@anthropic.com> * Update hidden states extraction integration test triggers (#45294) Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com> * Fix misleading error for audio duration limit rejection (#45113) Signed-off-by: jperezde <jperezde@redhat.com> * [Doc] AGENTS.md: add section about coding style (#45301) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> * [11b/n] Migrate Machete kernels to torch stable ABI (#45304) Signed-off-by: Chris Leonard <chleonar@redhat.com> Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Co-authored-by: Shengqi Chen <harry-chen@outlook.com> * [KV Connector]: Support KV push from Prefill to Decode node using Nixl KV Connector (#35264) Signed-off-by: Sunita Nadampalli <nadampal@amazon.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> * [Model] Remove Mono-InternVL (InternLM2VEForCausalLM) (#45129) Signed-off-by: Xianbao QIAN <xianbao.qian@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> * [BUGFIX][XPU] Update fa interface for compatibility (#45394) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> * [Metrics] Add group-aware KV cache capacity to vllm:cache_config_info (#42206) The startup log already reports the correct group-aware KV cache capacity for hybrid models, but Prometheus did not expose matching info in 'vllm:cache_config_info`. This PR adds kv_cache_size_tokens and kv_cache_max_concurrency. Signed-off-by: Ethan Feng <ethan.fengch@gmail.com> * [V1][Metrics] Add MLA attention metrics for DeepSeek MFU estimation (#39457) Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> * [Bug] Migrate Reset cache for both v2 and v1 model runner (#42759) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [Core] Support structured outputs for beam search (#35022) Signed-off-by: Guan-Ming (Wesley) Chiu <guanmingchiu@gmail.com> Signed-off-by: Guan-Ming (Wesley) Chiu <105915352+guan404ming@users.noreply.github.com> * [Core][KV Connector] fix scheduler KV connector stats aggregation (#43877) Fixes scheduler-side KV connector stats collection so that: 1. update_connector_output() runs before scheduler-side stats are collected. 2. worker-side and scheduler-side KV connector stats are aggregated when both are present. 3. scheduler-only KV connector stats are still emitted when no worker-side stats exist. Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com> Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com> * [Frontend] Support strict mode for tool calling with ResponsesAPI (#45396) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> * [Docs][KV Connector][NIXL] document KV Transfer stat logging and Prometheus metrics (#44055) Signed-off-by: Sai Sridhar <tarrasridhar1154@gmail.com> * [Rust Frontend] Add standalone `granite4` tool parser (#45216) Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com> Co-authored-by: Bugen Zhao <i@bugenzhao.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [Model] Add encoder CUDA graph support to Lfm2VL (#44930) Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com> * [Kernel][Helion][1/N] Add Helion kernel for dynamic_per_token_scaled_fp8_quant (#33790) Signed-off-by: Sean Chen <seachen@redhat.com> Co-authored-by: Yanan Cao <gmagogsfm@gmail.com> * [Model][Dflash] Enable Dflash support for Qwen3NextForCausalLM targets (#45319) Signed-off-by: Jonas I. Liechti <j-i-l@t4d.ch> * [Migration] Migrate GGUF quantization support to plugin (#39612) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> * [Perf] Use native DSA indexer decode path for next_n > 2 on SM100 (#45322) Signed-off-by: zixi-qi <zixi@inferact.ai> Co-authored-by: Claude Fable 5 <noreply@anthropic.com> Co-authored-by: Yongye Zhu <zyy1102000@gmail.com> * [Core][AMD] Propagate shutdown timeout to MultiprocExecutor (#43154) Signed-off-by: Ryan Rock <ryan.rock@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * [Refactor] Deprecate ResponsesParser wrapper, inline parsing into ParsableContext (#45431) Signed-off-by: sfeng33 <4florafeng@gmail.com> * [ROCm] Bump Torch to 2.11 (#45362) Signed-off-by: Micah Williamson <micah.williamson@amd.com> * [Attention] Improve attention benchmarks: configs and profiling (#39336) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> * [Model Runner v2] Migration from v1 to v2, with Qwen and DSv2 MOE models [3/N] (#42667) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [Kernel] Consolidate Marlin thread-tile padding across all dense Marlin paths (#45295) Signed-off-by: mgoin <mgoin64@gmail.com> * Add the QuantizedActivation linear-kernel contract (#44260) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> * [ROCm][DSV4][Perf] Fuse inverse-RoPE and cache bf16 wo_a in o-projection (#45103) Signed-off-by: Fangzhou Ai <fangzhouai@gmail.com> Co-authored-by: Claude Fable 5 <noreply@anthropic.com> * [Bugfix][CPU] Don't build triton-cpu on arm64 release image (#45401) Signed-off-by: khluu <khluu000@gmail.com> * [BugFix] Avoid prematurely freeing cached mm encoder outputs (#45347) Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Nick Hill <nickhill123@gmail.com> * [Bugfix] Set type/role explicitly in streaming message_start event (#45376) Signed-off-by: Wayne Chiu <waynehacking8@gmail.com> * [Bugfix] Replace deprecated Qwen2VLImageProcessorFast with Qwen2VLImageProcessor (#42700) Signed-off-by: abinggo <107740309+abinggo@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Roger Wang <hey@rogerw.io> * [CI] Wait for SSL cert refresher events in the test (#45489) Signed-off-by: Andreas Karatzas <akaratza@amd.com> * [Render] Add `/derender` endpoints for disaggregated postprocessing (#43606) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> * [Bugfix] Return the tokenizer from maybe_make_thread_pool so it survives pickling (#45460) Signed-off-by: Wayne Chiu <waynehacking8@gmail.com> * [Doc] Fix uv dependency resolution failure for setuptools during CPU source builds (x86 & ARM) (#45412) Signed-off-by: midas <the.anon.github@gmail.com> * [Model Runner V2] Fix `openai.InternalServerError: Error code: 500 - 'list index out of range'` (#45467) Signed-off-by: yewentao256 <zhyanwentao@126.com> * Treat null completion max_tokens like the default (#45491) Signed-off-by: Andreas Karatzas <akaratza@amd.com> * [CI Bug] Fix `ValueError: There is no module or parameter named 'model.vision_tower.vision_model'` (#45478) Signed-off-by: yewentao256 <zhyanwentao@126.com> * [Security] Add timeout guard for regex compilation in structured outp… (#45118) Signed-off-by: jperezde <jperezde@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Security] Fix DoS via prompt_embeds on M-RoPE models (#45252) Signed-off-by: jperezde <jperezde@redhat.com> * Fix docs build on `main` (#45536) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Bugfix] Reject structured outputs for diffusion decoders with a clear error (#45468) Signed-off-by: Wayne Chiu <waynehacking8@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> * [Perf] SM90 cutlass fp8 mm supports odd M by swap_ab, 180~290% kernel performance improvement (#44572) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [Core] Simplify MRV2 async output handling (#45442) * [Bugfix] nightly Docker images crash with ImportError: AnthropicOutputConfig since May 28 (#44795) Signed-off-by: achyuthan.s <113010327+Achyuthan-S@users.noreply.github.com> Signed-off-by: Achyuthan S <achyuthan.sivasankar@gmail.com> Signed-off-by: Achyuthan Sivasankar <achyuthan.sivasankar@gmail.com> Co-authored-by: Shengqi Chen <harry-chen@outlook.com> * [Build] Fix CUDA arch build coverage gaps (#45277) Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Co-authored-by: Xin Li <xinli-sw@users.noreply.github.com> Co-authored-by: ShawRong <ShawRong@users.noreply.github.com> Co-authored-by: Change72 <Change72@users.noreply.github.com> * [V1][Spec Decode] Add Dynamic SD (#32374) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com> * [Bugfix][DCP] Fix illegal memory access in DCP a2a decode under full CUDA graphs (#45487) * [XPU] Support int4 group_size=32 W4A16 MoE (#45136) Signed-off-by: Marceli Fylcek <marceli.fylcek@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> * [ROCm][Perf] Enable W4A16 FlyDSL MoE (#44400) Signed-off-by: amd-asalykov <asalykov@amd.com> Signed-off-by: Amanzhol Salykov <asalykov@amd.com> * [Perf] Use bisect for mm feature lookup in model runner v2 (#45566) Signed-off-by: Roger Wang <hey@rogerw.io> * [BugFix] Fix prompt_embeds for multimodal models (#45383) Signed-off-by: ruinan ma <r7ma3088@gmail.com> * Added real /v1/embeddings support for messages + chat_template_kw (#45173) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com> * [Bugfix][Model] Validate runai_streamer model_loader_extra_config (#45291) Signed-off-by: Ting Sun <suntcrick@gmail.com> * [Bugfix] Stream Llama4 weight loading to avoid host-OOM with copy-returning loaders (#44645) Signed-off-by: Noa Neria <nneria@nvidia.com> * [XPU] Enable sequence parallel support for XPU (#38608) Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com> Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com> Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com> * [Bugfix][CPU] Honor cgroup memory limit when computing KV cache size (#45086) Signed-off-by: baoloongmao <baoloongmao@tencent.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> * [CPU] Refine CPU attention frontend (#45391) Signed-off-by: jiang1.li <jiang1.li@intel.com> * [Bugfix][CI] Update Dockerfile dependency graph PNG (#45602) Signed-off-by: sfeng33 <4florafeng@gmail.com> * [Frontend] Add Streaming Parser Engine and new Qwen3 Parser (#45413) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Flora Feng <4florafeng@gmail.com> * Fix included router missing path for `FastAPI >=0.137` (#45629) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> * [Bugfix][V1] Split V2 model-runner attention groups on num_heads_q (#45564) Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> * [Model] Remove XverseForCausalLM (#45638) Signed-off-by: Xianbao QIAN <xianbao.qian@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> * [Feature][Frontend] Report multimodal token counts in usage.prompt_tokens_details (#45458) Signed-off-by: Ting Sun <suntcrick@gmail.com> * [Bugfix] Reject out-of-range temperature values in SamplingParams (#44965) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> * [Bugfix][Rust] Sync EngineCoreReadyResponse with the Python dataclass (#45557) Co-authored-by: Bugen Zhao <i@bugenzhao.com> Signed-off-by: Will Eaton <weaton@redhat.com> Signed-off-by: Bugen Zhao <i@bugenzhao.com> * [Rust Frontend] Add external→internal request-id map for abort() (#45137) Signed-off-by: Sahil Singh <sahiilsiingh37@gmail.com> * [Models] Fix MiMo v2.x QKV TP sharding + FP4 support (#45200) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [Rust Frontend] Support `parallel_tool_calls = false` (#44760) Signed-off-by: zhoujinyu <2319109590@qq.com> * [Bugfix][Rust Frontend] Make metrics respect --served-model-name (#45465) Signed-off-by: reidliu41 <reid201711@gmail.com> * [XPU] skip UT test_with_ngram_gpu_spec_decoding (#44423) Signed-off-by: Lai, Yejing <yejing.lai@intel.com> * [ROCm][Doc] Add installation notes about python version requirement (#45671) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> * [Docs] Update the online serving docs. (#45676) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> * [Bugfix] Unset HF's default max_new_tokens for DiffusionGemma (#45417) Signed-off-by: Martin Kukla <martin.kukla@cantab.net> * (security) Enforce audio upload size limit before full file materialization (#45510) Signed-off-by: jperezde <jperezde@redhat.com> * Fix the E8M0 scale computation in the MXFP4 (W4A4) MOE CUTLASS kernel (#43557) Signed-off-by: Xin He <xin3.he@intel.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> * Remove redundant Triton KV cache dtype asserts and enforce architectural support (fp8 >= sm89) (#43914) Signed-off-by: Mike G <180722391+mikekg@users.noreply.github.com> Co-authored-by: Michael Gschwind <mgschwind@nvidia.com> * [Bugfix] Two-phase KV allocation for cross-group prefix cache hits (supersedes #33775) (#44409) Signed-off-by: Saddss <2872669061@qq.com> * [Chore] Consolidate reasoning/tool parser attributes into unified Parser in chat serving (#45548) Signed-off-by: sfeng33 <4florafeng@gmail.com> * [AMD][Bugfix][Quantization] Honor fused-name match in is_layer_skipped (#43981) * [Model] Add MiniMax M3 support (#45381) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Bugen Zhao <i@bugenzhao.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: functionstackx <47992694+functionstackx@users.noreply.github.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai> Co-authored-by: OpenAI Codex <codex@openai.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg> Co-authored-by: Bugen Zhao <i@bugenzhao.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com> Co-authored-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Jee Jee Li <jeejeelee@inferact.ai> * [KV Offloading] Implement `reset_cache` for `TieringOffloadingManager` (#44541) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [Bugfix] Chat Completions Harmony Refactor Clean up (#45464) Signed-off-by: Yifan Zong <yzong@redhat.com> Co-authored-by: Ben Browning <bbrownin@redhat.com> * [Perf] Optimize DSv4 prefill chunk planning, 4.0% E2E Throughput Improvement (#45061) Signed-off-by: yewentao256 <zhyanwentao@126.com> * [Frontend] Skip structural tags for auto tool_choice without strict mode (#45600) Signed-off-by: sfeng33 <4florafeng@gmail.com> * [Model Runner V2][Bugfix] Fix MRV2 LoRA warmup (#35536) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai> Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Woosuk Kwon <woosuk@inferact.ai> * Fix parallel_tool_calls: null treated as false instead of default true (#44955) Signed-off-by: factnn <166481866+factnn@users.noreply.github.com> * [Frontend] Replace legacy Gemma4 parsers with engine-based implementation (#45588) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Flora Feng <4florafeng@gmail.com> * [Bugfix] Defer block freeing until in-flight steps finish under async scheduling + PD KV consumer (#45357) Signed-off-by: llx-08 <2596671364@qq.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com> * nixl_ep: Skip post-receive quantization for NVFP4 (#45606) Signed-off-by: Itay Alroy <ialroy@nvidia.com> * [EP] Query NIXL EP top-k index dtype (#45298) Signed-off-by: Itay Alroy <ialroy@nvidia.com> * [EP] Enable DBO with NIXL EP (#45275) Signed-off-by: Itay Alroy <ialroy@nvidia.com> * [DSV4][Minor] Fix supported KV cache dtypes (#44892) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> * [Misc][Model] add io processor for query/document embeddings from ColBERT (jinaai/jina-colbert-v2) (#45210) Signed-off-by: thomas <thomas.varghese@columbia.edu> * [Rust Frontend] Support `max_logprobs` validation (#45674) Signed-off-by: Bugen Zhao <i@bugenzhao.com> * [Rust Frontend] Lower out-of-vocab validation to `text` layer (#45685) Signed-off-by: Bugen Zhao <i@bugenzhao.com> * [Multimodal] Add Qwen3-VL video loader (#44412) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> * [BugFix] Support async scheduling with prompt embeds for multimodal models (#45673) Signed-off-by: Ruinan Ma <r7ma3088@gmail.com> * [XPU] Fix Triton attn fp8/bf16 check failing (#45758) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * [Bugfix][Gemma4] Fix offline parser truncation, adjust_request token leak, and chat template sync (#45553) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> * [Rust Frontend] Require `ModelConfig.vocab_size` to be present (#45696) Signed-off-by: Bugen Zhao <i@bugenzhao.com> * [Frontend] [Parser] Migrate Nemotron V3 to streaming parser engine (#45755) Signed-off-by: Ben Browning <bbrownin@redhat.com> * [Core] Use fastsafetensors ParallelLoader for weight loading (#40183) Signed-off-by: Git Bisector <gitbisector@gmail.com> Signed-off-by: gitbisector <gitbisector@gmail.com> Signed-off-by: git bisector <gitbisector@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> * Register parsed config classes before tokenizer init (#40299) Signed-off-by: Bortlesboat <bortstheboat@gmail.com> Co-authored-by: OpenAI Codex <codex@openai.com> * [Misc] Added validation for Cohere /v2/embed input field exclusivity (#45640) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com> * [Cleanup] Remove dead env (#45777) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bug Fix] Allow pinned memory for WSL2 (#41496) Signed-off-by: Jimmy Lee <hirejimmylee@gmail.com> * [CPU] Support Gemma Diffusion (#45690) Signed-off-by: jiang1.li <jiang1.li@intel.com> * [Bugfix] Prevent cuMemcpyBatchAsync segfault with MTP and KV offloading (#44784) Signed-off-by: joshua <joshua.abraham@multicorewareinc.com> Co-authored-by: joshua <joshua.abraham@multicorewareinc.com> Co-authored-by: Or Ozeri <oro@il.ibm.com> * [Frontend] Remove AsyncMicrobatchTokenizer. (#45759) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> * [Bugfix] Fix trtllm fused allreduce+rms_norm for transformers backend (#45307) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> * [XPU][CI] add intel xpu cases for nightly CI (#44372) Signed-off-by: wenjun.liu <wenjun.liu@intel.com> Signed-off-by: zengxian <xiangdong.zeng@intel.com> Co-authored-by: zengxian <xiangdong.zeng@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> * [Misc]Clean up useless test (#45792) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> * Add Triton recompile detection (#45631) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg> * [MM][Perf][CG] Support dual-path ViT full CUDA graph for DeepSeek-OCR (#43586) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> * [KV Connector][Mooncake] Pipeline-parallel support for PD-disaggregated serving with Mooncake connector (#44528) Signed-off-by: hanhan.hank <hanhan.hank@bytedance.com> Signed-off-by: Hank Han <hanhan7630@outlook.com> * [Refactor] Remove `Fp8OnlineLinearMethod` as scheduled (#45463) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [ZenCPU] Add zencpu Platform Runtime Logging and Docs (#42726) Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> * [ROCm][CI] Gate incompatible HF references on Transformers v5 (#41532) Signed-off-by: Andreas Karatzas <akaratza@amd.com> * [Quant] Support modelopt_mixed on Ampere (SM80/SM86) (#45306) Signed-off-by: Mike G <180722391+mikekg@users.noreply.github.com> * [Bugfix][MoE] Restore routed output unpadding before shared expert add (#45707) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [Perf] Add VLLM_TRITON_FORCE_FIRST_CONFIG to skip Triton autotuning (#42425) Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com> Co-authored-by: Claude <noreply@anthropic.com> * [CI] Fix attention benchmark smoke test (#45728) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Claude <noreply@anthropic.com> * [Rust Frontend] Add CORS support (#45753) Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com> * [Bugfix] Fix FlashMLA sparse accuracy with topk_length and zero-init padding (#36616) Signed-off-by: AjAnubolu <anuboluajay@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> * [Kernel][Helion][1/N] Add Helion kernel for rms_norm_per_block_quant (#36895) Signed-off-by: Sean Chen <seachen@redhat.com> Co-authored-by: Yanan Cao <gmagogsfm@gmail.com> * feat: MLA prefill enable FA4 fp8 output (#43050) Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com> * [ROCm][Cleanup] Remove stale AITER FA hybrid KV-cache TODO (#44178) Signed-off-by: Tuukka Sarvi <tuukka.sarvi@amd.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> * [Model] Add HrmTextForCausalLM (Hierarchical Reasoning Model — Text) (#43098) Signed-off-by: Wuyifei <wuyifei@me.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * Upgrade tpu-inference to v0.22.1 (#45793) * [ROCm][CI] Patch conftest to resolve occasional OOMs (#45722) Signed-off-by: Micah Williamson <micah.williamson@amd.com> * [Model Runner V2] Enable GraniteMOE for MRv2 by default (#45461) Signed-off-by: yewentao256 <zhyanwentao@126.com> * [Model] Remove Dots1ForCausalLM (#45637) Signed-off-by: Xianbao QIAN <xianbao.qian@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> * [Bugfix][Core] Fall back when numactl --membind is blocked in constrained containers (#45438) Signed-off-by: Ting Sun <suntcrick@gmail.com> Co-authored-by: Shengqi Chen <harry-chen@outlook.com> * [KVConnector][MoRIIO] Allow overriding the advertised host IP (#45488) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [KV Connector][Mooncake] Add cache_prefix to namespace store keys (#45767) Signed-off-by: Dao Le <Dao007forever@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [Frontend] Add Streaming Parser Engine and new MinimaxM2 Parser (#45701) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> * [Bugfix] Fix Qwen3 prompt tool-call reasoning false positive (#45763) Signed-off-by: Alex Bilichenko <alexbi29@users.noreply.github.com> Co-authored-by: Alex Bilichenko <alexbi29@users.noreply.github.com> * [PERF] Fuse multi-group block table staged writes (#44944) Signed-off-by: jesse <szxfml@gmail.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> * [ROCm][Quant] mxfp8 moe/linear gfx950 tuning for MiniMax-M3 (#45725) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com> * [Misc] Update Mergify tool-calling label (#45853) Signed-off-by: sfeng33 <4florafeng@gmail.com> * [Core] Add prefill step cadence for better non-PD DP balancing (#44558) Signed-off-by: Nick Hill <nickhill123@gmail.com> * [ROCm][CI] fix multimodel run cmds (#45858) Signed-off-by: Divakar Verma <divakar.verma@amd.com> * [Bugfix] Gemma4: skip forced JSON for required/named tool choice (#45795) Signed-off-by: Federico Iezzi <fiezzi@google.com> * [Kernel] Support GLM-5 dimensions for TRT-LLM ragged MLA prefill (#43525) Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> * Apply LRU policy only to proper cache entries (#42656) Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com> * [Kernel] Support DS Mamba tail copy for MTP align mode (#45473) Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com> Co-authored-by: Thomas Parnell <tom.parnell@gmail.com> * [XPU][CI] fix server test file path (#45870) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * [Bugfix] Fix MoE model load OOM in FlashInfer_TRTLLM backend with sleep mode (#45589) Signed-off-by: Dakai An <dakaian108@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [Bugfix][Gemma4] Fix parsing when thinking is disabled (#45832) Signed-off-by: Federico Iezzi <fiezzi@google.com> * [CI] Run pre-commit on self-hosted vllm-runners (#45865) Signed-off-by: khluu <khluu000@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [XPU] Fix test_spec_decode_logprobs: use FLASH_ATTN for XPU in GPU_DETERMINISM_KWARGS (#44468) Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> * [Bugfix][ROCm] Fix MiniMax-M3 FP8 KV cache dtype (#45720) Signed-off-by: Cam Quilici <cjquilici@gmail.com> Signed-off-by: Cameron Quilici <cjquilici@gmail.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> * [Bugfix][ROCm] Fix FP8 per-tensor scale rank mismatch causing Inductor assertion failure (#44912) Signed-off-by: nehmathe2 <nehmathe2@gmail.com> Signed-off-by: Divakar Verma <divakar.verma@amd.com> Signed-off-by: nehmathe <nehmathe@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Divakar Verma <divakar.verma@amd.com> Co-authored-by: Andreas Karatzas <akaratza@amd.com> * [ModelRunnerV2] Various model/config compatibility fixes (#45868) Signed-off-by: Nick Hill <nickhill123@gmail.com> * [Bugfix][V1] Clean up compiled-model bytecode hooks on VllmRunner exit (#45195) Signed-off-by: Ting Sun <suntcrick@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [FlexAttention] make custom mask mods fully cudagraphable (#45232) Signed-off-by: Angel Li <liangel@meta.com> * [M3] Tune Triton indexer score decode for spec-decode (#45743) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [CI][NIXL] Pin NIXL to 1.2.0 (#45843) Signed-off-by: Itay Alroy <ialroy@nvidia.com> Signed-off-by: Itay Alroy <75032521+itayalroy@users.noreply.github.com> Co-authored-by: ovidiusm <ovidium@nvidia.com> * [M3] Enable FP8 sparse GQA (#45744) Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg> Co-authored-by: Yongye Zhu <zyy1102000@gmail.com> * [Bugfix][Quantization] Reject unsupported compressed tensors KV cache schemes (#45312) Signed-off-by: Ting Sun <suntcrick@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [BugFix][CI] Fix scheduler plugin test (#45897) Signed-off-by: Nick Hill <nickhill123@gmail.com> * [Rust Frontend] Support prompt-only completions (#44938) Signed-off-by: reidliu41 <reid201711@gmail.com> * [Rust Frontend] Add /abort_requests endpoint (#44382) Signed-off-by: Sahil Singh <sahiilsiingh37@gmail.com> * [Rust Frontend] Add serde defaults for omit_defaults fields in `EngineCoreSamplingParams` (#45848) Signed-off-by: Will Eaton <weaton@redhat.com> * [Kernel] Add weightless RMSNorm CUDA kernels for has_weight=False (#41430) (#44109) Signed-off-by: hello-args <args.sarkar@gmail.com> * [Misc] Validate Cohere Embed Mixed Content Payloads (#45873) Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com> * [Rust Frontend] Support hybrid/external DP LB in Python supervised bootstrap (#45805) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Signed-off-by: Bugen Zhao <i@bugenzhao.com> * [KV Connector][Offloading] Avoid blocking the engine to flush offloads on idle (#45595) Signed-off-by: Itay Etelis <itay.etelis@ibm.com> Signed-off-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: Itay Etelis <Itay.etelis@gmail.com> Co-authored-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Itay Etelis <Itay.etelis@gmail.com> * [Bugfix] Fixes MiniCPM-O resampler device placement to avoid tensor device mismatch (#42332) Signed-off-by: j9smith <j.smith9103@outlook.com> * [Bugfix][Gemma4] Pre-initialise streaming reasoning state when prompt ends inside an open `<|channel>` (fixes #45834) (#45852) Signed-off-by: nikhilesh-csa <nchhetri@csa1.com> * [Bugfix][test] Use Salesforce/wikitext for ppl tests (#45913) Co-authored-by: wentian-byte <192079369+wentian-byte@users.noreply.github.com> * fix(security): enforce audio decode duration limit in chat completions path (#45908) Signed-off-by: jperezde <jperezde@redhat.com> * [ROCm][Bugfix]: Fallback GFX942 sparse MLA ops to Triton (#45782) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> * docs, kv_offloading: add docs for selective offload (#45279) Signed-off-by: Angelo Ruocco <ang@zurich.ibm.com> * [ROCm][Quant] Minimax-M3: Enable fp8_per_channel for bf16 weights on mi300x (#45854) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> * [MM][Perf][CG] Support ViT full CUDA graph for Kimi-VL (#41992) Signed-off-by: oguz <oguzhankir17@gmail.com> * [CI/Build] Avoid duplicate ViT CG test introduced by accident (#45654) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> * [XPU] Fix test_logprobs_e2e import error: pin lm-eval[api]>=0.4.12 (#44469) Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com> * [quant][autoround]Refactor INC quantization into package with INCScheme orchestrator (#40601) Signed-off-by: yiliu30 <yi4.liu@intel.com> Signed-off-by: Zhenzhong1 <zhenzhong.xu@intel.com> Signed-off-by: Zhenzhong Xu <zhenzhong.xu@intel.com> Co-authored-by: n1ck-guo <heng.guo@intel.com> Co-authored-by: Zhenzhong1 <zhenzhong.xu@intel.com> * [ROCm][AITER][Quark] Tag per-channel FP8 weights as PER_CHANNEL so AITER pre-shuffled GEMM is selected (#44626) Signed-off-by: Xavier Aguilar <xavier.aguilarfruto@amd.com> * Feature: Enable Flashinfer non-gated MoE bf16 (#43853) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> * [DSv4 Perf] DSv4 flashinfer sparse index cache for metadata, 2%~4% TTFT improvement (#45863) Signed-off-by: yewentao256 <zhyanwentao@126.com> * [Kernel][Helion][1/N] Add Helion kernel for rms_norm_dynamic_per_token_quant (#34432) Signed-off-by: Sean Chen <seachen@redhat.com> Co-authored-by: Yanan Cao <gmagogsfm@gmail.com> * [Bugfix][PD] Fix DSV4 disaggregated serving (#45831) Signed-off-by: ZhanqiuHu <zhu@redhat.com> * [Bugfix] Pass TP group to FlashInfer all-reduce fusion (#45917) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com> * [Log] Update deepgemm log (#45857) Signed-off-by: yewentao256 <zhyanwentao@126.com> * [DSV4 Perf] Optimize dsv4 cudagraph by reducing `eager_break_during_capture`, 26.8% ~ 27.9% E2E TTFT improvement (#45309) Signed-off-by: yewentao256 <zhyanwentao@126.com> * [feature] MiniMax-M3-MXFP4 support added (#45896) Signed-off-by: Qiang Li <qiang.li2@amd.com> * [Bugfix] MiniMax-M3 (AMD): add packed_modules_mapping and pass swiglu… (#45794) Signed-off-by: wangjiaxin99 <jiaxwang@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com> * [Refactor] Remove dead quantization code and tests (#45454) Signed-off-by: yewentao256 <zhyanwentao@126.com> * [Bugfix][Gemma4] Render reasoning on assistant turns without tool_calls (#45867) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> * [Bugfix][Model] Validate DefaultModelLoader / LoadConfig and fail with clear errors (#45196) Signed-off-by: Ting Sun <suntcrick@gmail.com> * [BUG] fix hidden states nan for hybrid attention models (#45849) Signed-off-by: shanjiaz <hezhao@redhat.com> Co-authored-by: shanjiaz <hezhao@redhat.com> * [Bugfix] Fix NixlConnector handshake block_len validation for GQA-replicated KV heads (#45879) Signed-off-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: waynehacking8 <waynehacking8@gmail.com> * Revert "[DSV4 Perf] Optimize dsv4 cudagraph by reducing `eager_break_during_capture`" (#45309) (#45972) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [XPU][CI] add model runner v2 into CI (#44650) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * [CI/Build][Bugfix] Fix SD LoRA (#45941) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Bugfix] Complete one-shot fused all-reduce PDL at end to avoid NaN (#45448) * [Rust Frontend][Perf] O(n) argument scan in tool parser (#45826) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Signed-off-by: Bugen Zhao <i@bugenzhao.com> * [XPU] Fix FP8 block-scaled scheme selection on non-CUDA platforms (#43958) Signed-off-by: Lai, Yejing <yejing.lai@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> * [Rust Frontend] Validate tokenized bad_words vocabulary range (#45876) Signed-off-by: reidliu41 <reid201711@gmail.com> * [CPUOffloading] Guard CPU eviction check (#45757) Signed-off-by: Varun Sundar Rabindranath <varun-sundar-rabindranath@h100-01.nemg-001.lab.rdu2.dc.redhat.com> Co-authored-by: Varun Sundar Rabindranath <varun-sundar-rabindranath@h100-01.nemg-001.lab.rdu2.dc.redhat.com> * [SimpleCPUOffloadConnector]: Add support for reset_cache() (#39726) Signed-off-by: Jonathan Chen <chenleejonathan@gmail.com> Signed-off-by: Jonathan <chenleejonathan@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * [Kernel] Add PDL support for DeepGEMM kernel (#42996) Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai> * [Fix][KV offload] Defer `on_request_finished` until in-flight transfers drain (#45823) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com> * [Refactor] Remove dead cutlass mxfp8 code (#44681) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Shengqi Chen <harry-chen@outlook.com> * [KV Offloading] Remove dummy worker-side stats from OffloadingConnector (#45905) Signed-off-by: Alex <alex.tech.lab@outlook.com> Signed-off-by: AlexHuang <jihuihuang@alexai.com> Co-authored-by: Or Ozeri <oro@il.ibm.com> * [Test][KV Connector] Add request_finished fence population tests for offloading scheduler (#45679) Signed-off-by: Alex <alex.tech.lab@outlook.com> Signed-off-by: AlexHuang <jihuihuang@future.com> Co-authored-by: Or Ozeri <oro@il.ibm.com> * Revert "[Kernel] Add PDL support for DeepGEMM kernel" (#45999) * [XPU] Update nixl to v0.10.1 in Dockerfile (#40287) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix(layernorm): route weightless RMSNorm to native impl The vllm_c rms_norm/fused_add_rms_norm guards claimed support for weight=None, but torch.ops._C.rms_norm cannot take a None/undefined weight (fails with 'Not yet supported ScalarType'). Weightless norms (e.g. Gemma4 v_norm, has_weight=False) now correctly fall back to the native impl. * test(steering): retarget key-coercion test at coerce_steering_spec The SetSteeringRequest.vectors field is intentionally dict[str, Any] (to admit the packed wire form), so the model does not coerce inner layer keys; coerce_steering_spec does. Test the actual coercion seam (which had no direct coverage) instead of obsolete model-level behavior. * fix(capture): skip broken consumer entry points instead of crashing A single third-party capture-consumer plugin that fails to import (e.g. one referencing a module not present in this build) previously crashed _load_entry_points and took down all capture admission. Skip it with a warning so other consumers keep working. --------- Signed-off-by: Sean Chen <seachen@redhat.com> Signed-off-by: Wentian Byte <3400259131@qq.com> Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com> Signed-off-by: Bugen Zhao <i@bugenzhao.com> Signed-off-by: Ben Browning <bbrownin@redhat.com> Signed-off-by: Xianbao QIAN <xianbao.qian@gmail.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Yifan Zong <yzong@redhat.com> Signed-off-by: Dao Le <Dao007forever@gmail.com> Signed-off-by: Neil Schemenauer <nas@arctrix.com> Signed-off-by: jpwang <jpwang@smail.nju.edu.cn> Signed-off-by: Hsiao-Yuan Chen <hy.c@Hsiao-YuandeMacBook-Pro.local> Signed-off-by: littlecircle0730 <littlecircle0730@gmail.com> Signed-off-by: littlecircle0730 <43994952+littlecircle0730@users.noreply.github.com> Signed-off-by: Ting Sun <suntcrick@gmail.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Sasindharan Sankar <sasindharansankar@email.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com> Signed-off-by: Chris Leonard <chleonar@redhat.com> Signed-off-by: Divakar Verma <divakar.verma@amd.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: allgather <all2allops@gmail.com> Signed-off-by: Rohan Potdar <rohan.potdar@amd.com> Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Signed-off-by: Will Eaton <weaton@redhat.com> Signed-off-by: Ma Jian <jian1.ma@intel.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: xiaguan <751080330@qq.com> Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com> Signed-off-by: jperezde <jperezde@redhat.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Signed-off-by: Sunita Nadampalli <nadampal@amazon.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Ethan Feng <ethan.fengch@gmail.com> Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com> Signed-off-by: Guan-Ming (Wesley) Chiu <guanmingchiu@gmail.com> Signed-off-by: Guan-Ming (Wesley) Chiu <105915352+guan404ming@users.noreply.github.com> Signed-off-by: srinivas_oo7 <sklinkedin0120@gmail.com> Signed-off-by: Sai Sridhar <tarrasridhar1154@gmail.com> Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com> Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com> Signed-off-by: Jonas I. Liechti <j-i-l@t4d.ch> Signed-off-by: zixi-qi <zixi@inferact.ai> Signed-off-by: Ryan Rock <ryan.rock@amd.com> Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: Micah Williamson <micah.williamson@amd.com> Signed-off-by: Fangzhou Ai <fangzhouai@gmail.com> Signed-off-by: khluu <khluu000@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Wayne Chiu <waynehacking8@gmail.com> Signed-off-by: abinggo <107740309+abinggo@users.noreply.github.com> Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Signed-off-by: midas <the.anon.github@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: achyuthan.s <113010327+Achyuthan-S@users.noreply.github.com> Signed-off-by: Achyuthan S <achyuthan.sivasankar@gmail.com> Signed-off-by: Achyuthan Sivasankar <achyuthan.sivasankar@gmail.com> Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com> Signed-off-by: Marceli Fylcek <marceli.fylcek@intel.com> Signed-off-by: amd-asalykov <asalykov@amd.com> Signed-off-by: Amanzhol Salykov <asalykov@amd.com> Signed-off-by: ruinan ma <r7ma3088@gmail.com> Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com> Signed-off-by: Noa Neria <nneria@nvidia.com> Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com> Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com> Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com> Signed-off-by: baoloongmao <baoloongmao@tencent.com> Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Sahil Singh <sahiilsiingh37@gmail.com> Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai> Signed-off-by: zhoujinyu <2319109590@qq.com> Signed-off-by: reidliu41 <reid201711@gmail.com> Signed-off-by: Lai, Yejing <yejing.lai@intel.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: Martin Kukla <martin.kukla@cantab.net> Signed-off-by: Xin He <xin3.he@intel.com> Signed-off-by: Mike G <180722391+mikekg@users.noreply.github.com> Signed-off-by: Saddss <2872669061@qq.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: functionstackx <47992694+functionstackx@users.noreply.github.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai> Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com> Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: factnn <166481866+factnn@users.noreply.github.com> Signed-off-by: llx-08 <2596671364@qq.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com> Signed-off-by: thomas <thomas.varghese@columbia.edu> Signed-off-by: Ruinan Ma <r7ma3088@gmail.com> Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Signed-off-by: Git Bisector <gitbisector@gmail.com> Signed-off-by: gitbisector <gitbisector@gmail.com> Signed-off-by: git bisector <gitbisector@gmail.com> Signed-off-by: Bortlesboat <bortstheboat@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Jimmy Lee <hirejimmylee@gmail.com> Signed-off-by: joshua <joshua.abraham@multicorewareinc.com> Signed-off-by: wenjun.liu <wenjun.liu@intel.com> Signed-off-by: zengxian <xiangdong.zeng@intel.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg> Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: hanhan.hank <hanhan.hank@bytedance.com> Signed-off-by: Hank Han <hanhan7630@outlook.com> Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com> Signed-off-by: AjAnubolu <anuboluajay@gmail.com> Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com> Signed-off-by: Tuukka Sarvi <tuukka.sarvi@amd.com> Signed-off-by: Wuyifei <wuyifei@me.com> Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Alex Bilichenko <alexbi29@users.noreply.github.com> Signed-off-by: jesse <szxfml@gmail.com> Signed-off-by: Hongxia Yang <hongxia.yang@amd.com> Signed-off-by: Federico Iezzi <fiezzi@google.com> Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com> Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com> Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com> Signed-off-by: Dakai An <dakaian108@gmail.com> Signed-off-by: Cam Quilici <cjquilici@gmail.com> Signed-off-by: Cameron Quilici <cjquilici@gmail.com> Signed-off-by: nehmathe2 <nehmathe2@gmail.com> Signed-off-by: nehmathe <nehmathe@amd.com> Signed-off-by: Angel Li <liangel@meta.com> Signed-off-by: Itay Alroy <75032521+itayalroy@users.noreply.github.com> Signed-off-by: hello-args <args.sarkar@gmail.com> Signed-off-by: Itay Etelis <itay.etelis@ibm.com> Signed-off-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: Itay Etelis <Itay.etelis@gmail.com> Signed-off-by: j9smith <j.smith9103@outlook.com> Signed-off-by: nikhilesh-csa <nchhetri@csa1.com> Signed-off-by: Angelo Ruocco <ang@zurich.ibm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: oguz <oguzhankir17@gmail.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> Signed-off-by: Zhenzhong1 <zhenzhong.xu@intel.com> Signed-off-by: Zhenzhong Xu <zhenzhong.xu@intel.com> Signed-off-by: Xavier Aguilar <xavier.aguilarfruto@amd.com> Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> Signed-off-by: ZhanqiuHu <zhu@redhat.com> Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com> Signed-off-by: Qiang Li <qiang.li2@amd.com> Signed-off-by: wangjiaxin99 <jiaxwang@amd.com> Signed-off-by: shanjiaz <hezhao@redhat.com> Signed-off-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com> Signed-off-by: Varun Sundar Rabindranath <varun-sundar-rabindranath@h100-01.nemg-001.lab.rdu2.dc.redhat.com> Signed-off-by: Jonathan Chen <chenleejonathan@gmail.com> Signed-off-by: Jonathan <chenleejonathan@gmail.com> Signed-off-by: Alex <alex.tech.lab@outlook.com> Signed-off-by: AlexHuang <jihuihuang@alexai.com> Signed-off-by: AlexHuang <jihuihuang@future.com> Co-authored-by: Xiaohong (Sean) Chen <seachen@redhat.com> Co-authored-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: wentian-byte <3400259131@qq.com> Co-authored-by: Chao-Ju Chen <ricky.chen@infinirc.com> Co-authored-by: Bugen Zhao <i@bugenzhao.com> Co-authored-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Tiezhen WANG <38108242+xianbaoqian@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: yzong-rh <yzong@redhat.com> Co-authored-by: Dao007forever <dao007forever@gmail.com> Co-authored-by: Neil Schemenauer <nas-github@arctrix.com> Co-authored-by: jpwang <jpwang@smail.nju.edu.cn> Co-authored-by: littlecircle0730 <43994952+littlecircle0730@users.noreply.github.com> Co-authored-by: Hsiao-Yuan Chen <hy.c@Hsiao-YuandeMacBook-Pro.local> Co-authored-by: Or Ozeri <or@ozery.com> Co-authored-by: Fangzhou Ai <31551580+Fangzhou-Ai@users.noreply.github.com> Co-authored-by: vLLM Contributor <contributor@vllm.ai> Co-authored-by: Ting SUN <suntcrick@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: sasindharan <117493393+sasindharan@users.noreply.github.com> Co-authored-by: Sasindharan Sankar <sasindharansankar@email.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com> Co-authored-by: Chris Leonard <chleonar@redhat.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Martin Kukla <martin.kukla@cantab.net> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Dipika Sikka <dsikka@redhat.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Alec Kohlhoff <134344302+aleckohlhoff@users.noreply.github.com> Co-authored-by: Porras Huang <20535584+porrashuang@users.noreply.github.com> Co-authored-by: scoootscooob <167050519+scoootscooob@users.noreply.github.com> Co-authored-by: allgather <all2allops@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: Yuwen Zhou <yuwen.zhou@intel.com> Co-authored-by: Will Eaton <wseaton@users.noreply.github.com> Co-authored-by: Ma Jian <jian1.ma@intel.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: cjackal <44624812+cjackal@users.noreply.github.com> Co-authored-by: JinYan Su <jinyansu792@gmail.com> Co-authored-by: Fynn Schmitt-Ulms <fschmitt@redhat.com> Co-authored-by: Juan Pérez de Algaba <124347725+jperezdealgaba@users.noreply.github.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Shengqi Chen <harry-chen@outlook.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: liuzhenwei <zhenweiliu@habana.ai> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Ethan Feng <ethan.fengch@gmail.com> Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Guan-Ming (Wesley) Chiu <105915352+guan404ming@users.noreply.github.com> Co-authored-by: Srinivas Krovvidi <194645829+Srinivasoo7@users.noreply.github.com> Co-authored-by: srinivas_oo7 <sklinkedin0120@gmail.com> Co-authored-by: Sai Sridhar Tarra <117087864+sridhar-3009@users.noreply.github.com> Co-authored-by: Tahsin Tunan <tahsintunan@gmail.com> Co-authored-by: Yi Zhong <207368749+vincentzed@users.noreply.github.com> Co-authored-by: Jonas I. Liechti <j-i-l@t4d.ch> Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com> Co-authored-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Ryan Rock <ryan.rock@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Flora Feng <4florafeng@gmail.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com> Co-authored-by: WEI CHENG CHIU <waynehacking8@gmail.com> Co-authored-by: longguo <107740309+abinggo@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Andreas Karatzas <akaratza@amd.com> Co-authored-by: Martin Hickey <martin.hickey@ie.ibm.com> Co-authored-by: midas <the.anon.github@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: achyuthan.s <113010327+Achyuthan-S@users.noreply.github.com> Co-authored-by: Xin Li <xinli-sw@users.noreply.github.com> Co-authored-by: ShawRong <ShawRong@users.noreply.github.com> Co-authored-by: Change72 <Change72@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com> Co-authored-by: Jeff (Junze) Ma <93145857+majunze2001@users.noreply.github.com> Co-authored-by: Marceli Fylcek <marceli.fylcek@intel.com> Co-authored-by: Amanzhol Salykov <asalykov@amd.com> Co-authored-by: Michael Ma <97484148+mrn3088@users.noreply.github.com> Co-authored-by: Taneem Ibrahim <taneem.ibrahim@gmail.com> Co-authored-by: Noa Neria <nneria@nvidia.com> Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com> Co-authored-by: maobaolong <baoloongmao@tencent.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> Co-authored-by: Peter Pan <peter.pan@daocloud.io> Co-authored-by: Sahil Singh <sahiilsiingh37@gmail.com> Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com> Co-authored-by: FAUST <2319109590@qq.com> Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com> Co-authored-by: Yejing Lai <yejing.lai@intel.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: Xin He <xin3.he@intel.com> Co-authored-by: Mike G <180722391+mikekg@users.noreply.github.com> Co-authored-by: Michael Gschwind <mgschwind@nvidia.com> Co-authored-by: Saddss <108515797+Saddss@users.noreply.github.com> Co-authored-by: RoyWang <Roy.Wang@amd.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: OpenAI Codex <codex@openai.com> Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com> Co-authored-by: Jee Jee Li <jeejeelee@inferact.ai> Co-authored-by: Ronen Schaffer <ronen.schaffer@ibm.com> Co-authored-by: Woosuk Kwon <woosuk@inferact.ai> Co-authored-by: Zang Peiyu <166481866+factnn@users.noreply.github.com> Co-authored-by: llx <54896441+llx-08@users.noreply.github.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: Itay Alroy <75032521+itayalroy@users.noreply.github.com> Co-authored-by: xx-thomas <113865951+xx-thomas@users.noreply.github.com> Co-authored-by: Luciano Martins <22145370+lucianommartins@users.noreply.github.com> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: gitbisector <gitbisector@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Andrew Barnes <bortstheboat@gmail.com> Co-authored-by: Jimmy Lee <58957694+thisisjimmyfb@users.noreply.github.com> Co-authored-by: joshua abraham <132982099+JOSH1024@users.noreply.github.com> Co-authored-by: joshua <joshua.abraham@multicorewareinc.com> Co-authored-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: wenjun liu <wenjun.liu@intel.com> Co-authored-by: zengxian <xiangdong.zeng@intel.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: Hank Han <hanhan7630@outlook.com> Co-authored-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Francesco Fusco <ffu@zurich.ibm.com> Co-authored-by: Ajay Anubolu <124525760+AjAnubolu@users.noreply.github.com> Co-authored-by: Carl Y <4531192+carlyou@users.noreply.github.com> Co-authored-by: Tuukka Sarvi <tuukka.sarvi@amd.com> Co-authored-by: yifei wu <50608184+abcd1927@users.noreply.github.com> Co-authored-by: Sting Lin <sting.lin@cienet.com> Co-authored-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com> Co-authored-by: alexbi29 <32223381+alexbi29@users.noreply.github.com> Co-authored-by: Alex Bilichenko <alexbi29@users.noreply.github.com> Co-authored-by: Song Zhixin <szxfml@gmail.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com> Co-authored-by: Federico <federico.iezzi@gmail.com> Co-authored-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com> Co-authored-by: Stan Wozniak <77159600+s3woz@users.noreply.github.com> Co-authored-by: sungsoo ha <sungsooh@nvidia.com> Co-authored-by: Thomas Parnell <tom.parnell@gmail.com> Co-authored-by: Dakai An <77474977+andakai@users.noreply.github.com> Co-authored-by: Federico <fiezzi@google.com> Co-authored-by: Cameron Quilici <cjquilici@gmail.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: nehmathe2 <nehmathe@amd.com> Co-authored-by: Divakar Verma <divakar.verma@amd.com> Co-authored-by: liangel-02 <liangel@meta.com> Co-authored-by: ovidiusm <ovidium@nvidia.com> Co-authored-by: arghyadeep sarkar <args.sarkar@gmail.com> Co-authored-by: Itay Etelis <92247226+Etelis@users.noreply.github.com> Co-authored-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Itay Etelis <Itay.etelis@gmail.com> Co-authored-by: Joel Smith <j.smith9103@outlook.com> Co-authored-by: Nikhilesh Chhetri <106703537+nikhilesh-csa@users.noreply.github.com> Co-authored-by: wentian-byte <192079369+wentian-byte@users.noreply.github.com> Co-authored-by: Angelo Ruocco <ang@zurich.ibm.com> Co-authored-by: Oğuzhan KIR <86883236+oguzhankir@users.noreply.github.com> Co-authored-by: Yi Liu <yi4.liu@intel.com> Co-authored-by: n1ck-guo <heng.guo@intel.com> Co-authored-by: Zhenzhong1 <zhenzhong.xu@intel.com> Co-authored-by: xaguilar-amd <xavier.aguilarfruto@amd.com> Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com> Co-authored-by: zhanqiuhu <49648934+ZhanqiuHu@users.noreply.github.com> Co-authored-by: danisereb <daserebrenik@nvidia.com> Co-authored-by: qli88 <qiang.li2@amd.com> Co-authored-by: wangjiaxin99 <jiaxwang@amd.com> Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com> Co-authored-by: shanjiaz <zsjwpianpian@gmail.com> Co-authored-by: shanjiaz <hezhao@redhat.com> Co-authored-by: Bryan Shan <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: Ace Eldeib <aeldeib@coreweave.com> Co-authored-by: Varun Sundar Rabindranath <varun-sundar-rabindranath@h100-01.nemg-001.lab.rdu2.dc.redhat.com> Co-authored-by: Jonathan Chen <chenleejonathan@gmail.com> Co-authored-by: AlexHuang <alex.tech.lab@outlook.com>

…2726) Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>

amd-lalithnc added 2 commits May 15, 2026 02:37

add zen cpu runtime logging

934be93

Make it easy to confirm platform activation and GEMM dispatch during real runs. Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I6049a7950a6c312c02892a740576411e886271a7

clarify amd zen cpu support

7fa2107

Make the CPU hardware note explicit that AMD Zen keeps the same vLLM model support. Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I5b4fcaecc344bab7daf3174d1c90b3d674819fcd

mergify Bot added documentation Improvements or additions to documentation cpu Related to CPU backends labels May 15, 2026

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

amd-lalithnc added 2 commits May 15, 2026 03:17

fix pre-commit nits

6ac6b41

Apply the formatter changes and fix the CPU include-file anchor so the branch-local pre-commit checks pass. Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I877af675bb19cd8e606e227490ffd0600f46e6f5

Merge branch 'main' into zen-runtime-logs

f35e221

AndreasKaratzas reviewed May 28, 2026

View reviewed changes

address review comments on docs and dispatch logs

7555ddc

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I3b246f6f5ac1d658a8962a686ae569467f803b6a

tlrmchlsmth reviewed May 28, 2026

View reviewed changes

AndreasKaratzas added the ready ONLY add when PR is ready to merge/full CI is needed label May 29, 2026

address review comments

d24d0dc

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: I7d16e0f80048b0893e157b3c13d27bf2b38cdc78

amd-lalithnc force-pushed the zen-runtime-logs branch from 7967219 to d24d0dc Compare May 29, 2026 05:48

address ci nits

8074d81

Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com> Change-Id: Ie92660278bda66af71e575da506c09c01f40322f

Merge branch 'main' into zen-runtime-logs

bc8c57d

tlrmchlsmth reviewed Jun 1, 2026

View reviewed changes

amd-lalithnc added 2 commits June 4, 2026 00:17

Merge branch 'main' into zen-runtime-logs

0ca058a

tlrmchlsmth reviewed Jun 12, 2026

View reviewed changes

tlrmchlsmth and others added 2 commits June 12, 2026 11:12

Merge branch 'main' into zen-runtime-logs

cffd91a

tlrmchlsmth enabled auto-merge (squash) June 12, 2026 15:12

Merge branch 'main' into zen-runtime-logs

eea4fb6

Merge branch 'main' into zen-runtime-logs

aaf22b0

tlrmchlsmth approved these changes Jun 16, 2026

View reviewed changes

tlrmchlsmth merged commit 405c7cf into vllm-project:main Jun 16, 2026
73 checks passed

		vLLM supports basic model inferencing and serving on x86 CPU platform, with
		data types FP32 and BF16.

		@pytest.mark.usefixtures("_mock_zentorch_linear_unary")
		def test_dispatch_cpu_unquantized_gemm_logs_zentorch_dispatch(monkeypatch):

	logger.debug(
	"AMD Zen CPU detected with zentorch installed, using ZenCpuPlatform."
	)

		VLLM_LOGGING_LEVEL=INFO vllm serve facebook/opt-125m --dtype bfloat16 \
		2>&1 \| grep "AMD Zen CPU detected with zentorch installed"

Uh oh!

Uh oh!

Conversation

amd-lalithnc commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

mergify Bot commented May 15, 2026

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

mergify Bot commented May 15, 2026

amd-lalithnc commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

amukho commented May 28, 2026

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndreasKaratzas commented May 29, 2026

mergify Bot commented May 29, 2026

mergify Bot commented May 29, 2026

amd-lalithnc commented May 29, 2026

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amd-lalithnc commented Jun 8, 2026

tlrmchlsmth left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

amd-lalithnc commented Jun 13, 2026

Uh oh!

Labels

4 participants

amd-lalithnc commented May 15, 2026 •

edited

Loading

amd-lalithnc commented May 27, 2026 •

edited

Loading

tlrmchlsmth left a comment •

edited

Loading