Skip to content

[Quantization] add online fp8 ptpc#44132

Merged
DarkLight1337 merged 4 commits into
vllm-project:mainfrom
walterbm:cohere/add-online-fp8-ptpc
Jun 8, 2026
Merged

[Quantization] add online fp8 ptpc#44132
DarkLight1337 merged 4 commits into
vllm-project:mainfrom
walterbm:cohere/add-online-fp8-ptpc

Conversation

@walterbm

@walterbm walterbm commented May 31, 2026

Copy link
Copy Markdown
Contributor

Purpose

Add online fp8 per-token activation + per-channel weight (ptpc) quantization

vllm serve ... --quantization fp8_per_channel

Test Plan

Added a unit test tests/quantization/test_fp8_ptpc.py and a quality test tests/models/quantization/test_fp8_ptpc.py

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
Comment thread vllm/model_executor/layers/quantization/online/fp8.py

@AndreasKaratzas AndreasKaratzas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks principled and clean, one small note for the test only (the other one is to be checked by an engineer from our team).

Comment thread tests/quantization/test_fp8_per_channel.py
Comment thread vllm/model_executor/layers/quantization/online/fp8.py

@AndreasKaratzas AndreasKaratzas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong, let me know.. also I would probably prefer to have the accuracy test in the already instantiated quantization test (tests/quantization/test_fp8_ptpc.py), or let me know what is the rationale behind the new test file just for the accuracy test.

Comment thread tests/models/quantization/test_fp8_ptpc.py Outdated
@walterbm

walterbm commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

also I would probably prefer to have the accuracy test in the already instantiated quantization test (tests/quantization/test_fp8_ptpc.py), or let me know what is the rationale behind the new test file just for the accuracy test.

sorry I thought the quality tests were supposed to be under tests/models/quantization/ and unit tests under tests/quantization/. happy to move the quality test to tests/quantization/ if that is better

@AndreasKaratzas

Copy link
Copy Markdown
Member

also I would probably prefer to have the accuracy test in the already instantiated quantization test (tests/quantization/test_fp8_ptpc.py), or let me know what is the rationale behind the new test file just for the accuracy test.

sorry I thought the quality tests were supposed to be under tests/models/quantization/ and unit tests under tests/quantization/. happy to move the quality test to tests/quantization/ if that is better

Um actually you are right, I missed the model that you instantiated on the top of the file. Mb, no need to move it.

@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) June 5, 2026 18:15
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 5, 2026

@divakar-amd divakar-amd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests fail on ROCm because "fp8_per_channel" in currently not added to the supported quantisation list in rocm.py LINK
While rocm.py can be updated to add "fp8_per_channel", I believe it will require some more in-depth review to verify the downstream effects. Hence, for now, requesting the changes below to correctly skip these tests on rocm

Comment thread tests/models/quantization/test_fp8_per_channel.py
Comment thread tests/quantization/test_fp8_per_channel.py
auto-merge was automatically disabled June 5, 2026 22:12

Head branch was pushed to by a user without write access

@walterbm

walterbm commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

These tests fail on ROCm because "fp8_per_channel" in currently not added to the supported quantisation list in rocm.py LINK While rocm.py can be updated to add "fp8_per_channel", I believe it will require some more in-depth review to verify the downstream effects. Hence, for now, requesting the changes below to correctly skip these tests on rocm

@divakar-amd thanks for reviewing! I merged your suggestions to skip the tests for ROCm

walterbm added 3 commits June 7, 2026 20:51
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
@walterbm walterbm force-pushed the cohere/add-online-fp8-ptpc branch from 34608f3 to 23f5b37 Compare June 8, 2026 00:51
@DarkLight1337 DarkLight1337 merged commit 753e9d5 into vllm-project:main Jun 8, 2026
72 checks passed
ekagra-ranjan pushed a commit to ekagra-ranjan/vllm that referenced this pull request Jun 9, 2026
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
waqahmed-amd-fi pushed a commit to waqahmed-amd-fi/vllm that referenced this pull request Jun 10, 2026
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
vivek8123 pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Jun 18, 2026
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: divineearthly <divineearthly@gmail.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
ohsono pushed a commit to ohsono/vllm that referenced this pull request Jul 3, 2026
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

5 participants