[Quantization] add online fp8 ptpc by walterbm · Pull Request #44132 · vllm-project/vllm

walterbm · 2026-05-31T21:51:40Z

Purpose

Add online fp8 per-token activation + per-channel weight (ptpc) quantization

vllm serve ... --quantization fp8_per_channel

Test Plan

Added a unit test tests/quantization/test_fp8_ptpc.py and a quality test tests/models/quantization/test_fp8_ptpc.py

Test Result

✅

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

AndreasKaratzas

PR looks principled and clean, one small note for the test only (the other one is to be checked by an engineer from our team).

AndreasKaratzas

I might be wrong, let me know.. also I would probably prefer to have the accuracy test in the already instantiated quantization test (tests/quantization/test_fp8_ptpc.py), or let me know what is the rationale behind the new test file just for the accuracy test.

walterbm · 2026-06-01T04:14:08Z

also I would probably prefer to have the accuracy test in the already instantiated quantization test (tests/quantization/test_fp8_ptpc.py), or let me know what is the rationale behind the new test file just for the accuracy test.

sorry I thought the quality tests were supposed to be under tests/models/quantization/ and unit tests under tests/quantization/. happy to move the quality test to tests/quantization/ if that is better

AndreasKaratzas · 2026-06-01T04:21:30Z

also I would probably prefer to have the accuracy test in the already instantiated quantization test (tests/quantization/test_fp8_ptpc.py), or let me know what is the rationale behind the new test file just for the accuracy test.

sorry I thought the quality tests were supposed to be under tests/models/quantization/ and unit tests under tests/quantization/. happy to move the quality test to tests/quantization/ if that is better

Um actually you are right, I missed the model that you instantiated on the top of the file. Mb, no need to move it.

divakar-amd

These tests fail on ROCm because "fp8_per_channel" in currently not added to the supported quantisation list in rocm.py LINK
While rocm.py can be updated to add "fp8_per_channel", I believe it will require some more in-depth review to verify the downstream effects. Hence, for now, requesting the changes below to correctly skip these tests on rocm

walterbm · 2026-06-05T22:13:33Z

These tests fail on ROCm because "fp8_per_channel" in currently not added to the supported quantisation list in rocm.py LINK While rocm.py can be updated to add "fp8_per_channel", I believe it will require some more in-depth review to verify the downstream effects. Hence, for now, requesting the changes below to correctly skip these tests on rocm

@divakar-amd thanks for reviewing! I merged your suggestions to skip the tests for ROCm

Signed-off-by: walterbm <walter.beller.morales@gmail.com>

Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>

Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Signed-off-by: divineearthly <divineearthly@gmail.com>

Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

walterbm requested review from AndreasKaratzas, ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth, yewentao256, youkaichao and zyongye as code owners May 31, 2026 21:51

walterbm commented May 31, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/quantization/online/fp8.py

AndreasKaratzas reviewed May 31, 2026

View reviewed changes

Comment thread tests/quantization/test_fp8_per_channel.py

Comment thread vllm/model_executor/layers/quantization/online/fp8.py

walterbm requested review from DarkLight1337 and ywang96 as code owners June 1, 2026 03:47

AndreasKaratzas reviewed Jun 1, 2026

View reviewed changes

Comment thread tests/models/quantization/test_fp8_ptpc.py Outdated

robertgshaw2-redhat approved these changes Jun 5, 2026

View reviewed changes

robertgshaw2-redhat enabled auto-merge (squash) June 5, 2026 18:15

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 5, 2026

divakar-amd suggested changes Jun 5, 2026

View reviewed changes

Comment thread tests/models/quantization/test_fp8_per_channel.py

Comment thread tests/quantization/test_fp8_per_channel.py

auto-merge was automatically disabled June 5, 2026 22:12
Head branch was pushed to by a user without write access

walterbm added 3 commits June 7, 2026 20:51

[Quantization] add online fp8 ptpc

a32658d

Signed-off-by: walterbm <walter.beller.morales@gmail.com>

add accuracy test

a88d424

Signed-off-by: walterbm <walter.beller.morales@gmail.com>

rename to fp8_per_channel

23f5b37

Signed-off-by: walterbm <walter.beller.morales@gmail.com>

walterbm force-pushed the cohere/add-online-fp8-ptpc branch from 34608f3 to 23f5b37 Compare June 8, 2026 00:51

Merge branch 'main' into cohere/add-online-fp8-ptpc

d3a3855

DarkLight1337 merged commit 753e9d5 into vllm-project:main Jun 8, 2026
72 checks passed

Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026

[Quantization] add online fp8 ptpc (vllm-project#44132)

244e383

Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[Quantization] add online fp8 ptpc (vllm-project#44132)

cddf239

Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

ohsono pushed a commit to ohsono/vllm that referenced this pull request Jul 3, 2026

[Quantization] add online fp8 ptpc (vllm-project#44132)

106d50e

Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Quantization] add online fp8 ptpc#44132

[Quantization] add online fp8 ptpc#44132
DarkLight1337 merged 4 commits into
vllm-project:mainfrom
walterbm:cohere/add-online-fp8-ptpc

walterbm commented May 31, 2026 •

edited

Loading

Uh oh!

AndreasKaratzas left a comment

Uh oh!

Uh oh!

AndreasKaratzas left a comment

Uh oh!

walterbm commented Jun 1, 2026

AndreasKaratzas commented Jun 1, 2026

divakar-amd left a comment

Uh oh!

Uh oh!

walterbm commented Jun 5, 2026

Uh oh!

Labels

5 participants

Uh oh!

Uh oh!

Conversation

walterbm commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

AndreasKaratzas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AndreasKaratzas left a comment

Choose a reason for hiding this comment

Uh oh!

walterbm commented Jun 1, 2026

AndreasKaratzas commented Jun 1, 2026

divakar-amd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

walterbm commented Jun 5, 2026

Uh oh!

Labels

5 participants

walterbm commented May 31, 2026 •

edited

Loading