[Bugfix] Add deepseek_v32 to Quark dynamic MXFP4 model type check#39498
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the _DEEPSEEK_V3_FAMILY_MODEL_TYPES constant in the Quark quantization configuration to include deepseek_v32, extending support for dynamic MXFP4 re-quantization to this model variant. I have no feedback to provide.
BowenBao
left a comment
There was a problem hiding this comment.
LGTM, thanks
Do you have the before and after throughput comparison?
tjtanaa
left a comment
There was a problem hiding this comment.
@shantipriya-amd please rebase. It looks good to me.
|
This pull request has merge conflicts that must be resolved before it can be |
f6a7eba to
f25db19
Compare
|
Rebased onto current main — conflict resolved (new HEAD: f25db19). The merge conflict in quark.py was caused by maybe_update_config() being restructured upstream; the rebased branch restores it with _DEEPSEEK_V3_FAMILY_MODEL_TYPES = frozenset({"deepseek_v3", "deepseek_v32"}). Verification: py_compile passed, 5 behavioral test cases all pass — deepseek_v3/deepseek_v32 fp4 → enabled ✅, non-fp4 / other model types → no-op ✅. |
|
Rebased onto current The merge conflict in _DEEPSEEK_V3_FAMILY_MODEL_TYPES = frozenset({"deepseek_v3", "deepseek_v32"})Verification:
|
|
@mergify requeue |
☑️ Command disallowed due to command restrictions in the Mergify configuration.Details
|
Validation ResultsValidated on 8× AMD MI355X (gfx950) with ConfigurationFunctional Tests
GSM8K (lm-eval, 5-shot)
Throughput Benchmark512 random prompts (~4503 input / ~140 output tokens), concurrency 128:
|
2da0029 to
59e5331
Compare
|
Hi @shantipriya-amd, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Head branch was pushed to by a user without write access
fb661f7 to
79fa7c7
Compare
| input_config = cast(dict[str, Any], config.get("input_tensors")) | ||
|
|
||
| if self._is_nvfp4(weight_config, input_config): | ||
| return QuarkNVFP4() |
There was a problem hiding this comment.
@shantipriya-amd was there a time where quark store fp4 weights using nvfp4 format? and does this mean right now quark models store mxfp4 weight directly?
@BowenBao could you take a look as well?
There was a problem hiding this comment.
@tjtanaa — yes to both questions:
Was NVFP4 ever used?
Yes — Quark has supported NVFP4 since#35859 and it remains a live code path today.
Do current Quark MXFP4 models store weights directly in MXFP4?
Yes — amd/DeepSeek-V3.2-mxfp4 stores MoE and FFN weights on disk in MXFP4 (OCP-MX format: group_size=32, scale_format=e8m0, dtype=fp4). This hits the _is_w_ocp_mx_a_x() path → QuarkOCP_MX.
|
Could you please rebase? |
Add null check for hf_config and type check for quant_config to ensure robustness when handling optional configurations. Fixes pre-commit checks and addresses code review feedback for safer attribute access. Signed-off-by: Shantipriya Parida <shantipriya.parida@amd.com>
d46cb9e to
ec037ad
Compare
|
@AndreasKaratzas : Thanks for your suggestion, tried to rebase and update the branch but merging still pending. |
|
@shantipriya-amd looks like #42754 is identical now and no longer needed, can you close the duplicate PR? |
|
Hi @shantipriya-amd, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, |
|
@tjtanaa — thank you for enabling auto-merge! Two CI checks are currently failing but both appear to be infrastructure issues unrelated to this one-line frozenset change: amd-docker-build-test-image-and-artifacts (exit 1) — docker build failure, likely an upstream issue affecting multiple PRs Could you confirm if these are pre-existing failures on main and re-trigger CI when the infrastructure is stable? Happy to rebase again if needed. |
…lm-project#39498) Signed-off-by: Shantipriya Parida <shantipriya.parida@amd.com>
…lm-project#39498) Signed-off-by: Shantipriya Parida <shantipriya.parida@amd.com>
…lm-project#39498) Signed-off-by: Shantipriya Parida <shantipriya.parida@amd.com> Signed-off-by: divineearthly <divineearthly@gmail.com>
…lm-project#39498) Signed-off-by: Shantipriya Parida <shantipriya.parida@amd.com>
…lm-project#39498) Signed-off-by: Shantipriya Parida <shantipriya.parida@amd.com>
Summary
Add
deepseek_v32to_DEEPSEEK_V3_FAMILY_MODEL_TYPESso that Quark correctly enablesdynamic_mxfp4_quantfor amd/DeepSeek-V3.2-mxfp4.Without this fix, excluded layers (e.g.
self_attn) are not dynamically re-quantized when serving MXFP4-quantized DeepSeek-V3.2 checkpoints, causing silent correctness degradation (all-zero or garbage output tokens).Root cause
_DEEPSEEK_V3_FAMILY_MODEL_TYPESis defined as:The
maybe_update_config()method uses this set to gatedynamic_mxfp4_quant. DeepSeek-V3.2 usesmodel_type = "deepseek_v32"in its HuggingFace config, which is not in the set. As a result,dynamic_mxfp4_quantstaysFalse, andget_quant_method()returnsUnquantizedLinearMethod()for attention projections instead of applying the correct quantization scheme.Fix
One-line change. Adding
"deepseek_v32"to the frozen set enables the dynamic MXFP4 path for the V3.2 model family while preserving existing V3 behavior.Validation
Validated on 8× AMD MI355X (gfx950) with
amd/DeepSeek-V3.2-mxfp4, TP=8,attention_backend=ROCM_AITER_MLA_SPARSE,--enforce-eager.Configuration
Functional Tests
/v1/modelsendpointWhat is 2+2?)2 + 2 = 4finish_reason=stopfinish_reason=stopGSM8K (lm-eval, 5-shot)
Throughput Benchmark
512 random prompts (~4503 input / ~140 output tokens), concurrency 128:
Notes
deepseek_v32model type is used by the official HuggingFace checkpoint atamd/DeepSeek-V3.2-mxfp4.