[Quantization] add online fp8 ptpc#44132
Conversation
AndreasKaratzas
left a comment
There was a problem hiding this comment.
PR looks principled and clean, one small note for the test only (the other one is to be checked by an engineer from our team).
AndreasKaratzas
left a comment
There was a problem hiding this comment.
I might be wrong, let me know.. also I would probably prefer to have the accuracy test in the already instantiated quantization test (tests/quantization/test_fp8_ptpc.py), or let me know what is the rationale behind the new test file just for the accuracy test.
sorry I thought the quality tests were supposed to be under |
Um actually you are right, I missed the model that you instantiated on the top of the file. Mb, no need to move it. |
divakar-amd
left a comment
There was a problem hiding this comment.
These tests fail on ROCm because "fp8_per_channel" in currently not added to the supported quantisation list in rocm.py LINK
While rocm.py can be updated to add "fp8_per_channel", I believe it will require some more in-depth review to verify the downstream effects. Hence, for now, requesting the changes below to correctly skip these tests on rocm
Head branch was pushed to by a user without write access
@divakar-amd thanks for reviewing! I merged your suggestions to skip the tests for ROCm |
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
34608f3 to
23f5b37
Compare
Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Signed-off-by: divineearthly <divineearthly@gmail.com>
Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Signed-off-by: walterbm <walter.beller.morales@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Purpose
Add online fp8 per-token activation + per-channel weight (ptpc) quantization
Test Plan
Added a unit test
tests/quantization/test_fp8_ptpc.pyand a quality testtests/models/quantization/test_fp8_ptpc.pyTest Result
✅
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.