[Benchmark] Auto-detect and correct client/server tokenizer mismatch for random dataset#44708
Conversation
|
@AndreasKaratzas @DarkLight1337 @tjtanaa would this be a more acceptable fix for the inflated token issue on DS32 ? (Not sure why but no reviewers on the PR ?) If not I can close this immediately 😅 . I was just trying to solve this quickly now as other folks are now getting impacted by this when comparing perf across images. But I see this still as a bit of future proofing |
|
cc @frida-andersson since you are the author of the original PR |
Looks good. I'm not the best guy to review this. At the same time I don't completely understand why tokenizers could disagree (hence seems a bit like masking the issue), but again I'm not sure I'm the best guy to review this so it might actually be the way to go here. I checked the other PR very briefly too and there seems not to be a good explanation of why this can happen. I also saw @DarkLight1337 actually commenting the same thing there. |
|
@AndreasKaratzas So the main answer to this
For DeepSeek-V3.2, transformers >= 5.0 doesn't have native support yet (huggingface/transformers#41251), so it is silently falls back to a wrong tokenizer. The server is laoding the right one. |
|
Thanks for picking this up @akii96 ! This approach is clean and the "zero changes to dataset code" property is a real win. I only have one minor comment - if |
…for random dataset Signed-off-by: Aakif Nawaz <aakif.nawaz@amd.com> Co-authored-by: Frida Andersson <fanderss@amd.com>
816171d to
f9ac854
Compare
|
Thanks @frida-andersson 🙏 @DarkLight1337 addressed Frida's nit (warning on /tokenize unavailable) |
DarkLight1337
left a comment
There was a problem hiding this comment.
This looks much simpler, thanks!
…for random dataset (vllm-project#44708) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
…for random dataset (vllm-project#44708) Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
…for random dataset (vllm-project#44708)
…for random dataset (vllm-project#44708)
…for random dataset (vllm-project#44708) Signed-off-by: divineearthly <divineearthly@gmail.com>
…for random dataset (vllm-project#44708)
…for random dataset (vllm-project#44708)
…for random dataset (vllm-project#44708)
Alternative to #42532. Addresses the same problem (input token inflation when bench-side and server-side tokenizers disagree) but takes a different approach based on reviewer feedback there:
After
get_samples(), probes the server's/tokenizeendpoint with the first prompt. If counts match, returns immediately. If not, re-aligns all prompts via/tokenize+/detokenizeso server-side token counts are exact.Verified on MI355X
vllm/vllm-openai-rocm:nightly-3f0a91bb96f8d72e0498b95c166e817deae14d62VLLM_ROCM_USE_AITER=1 VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT4 vllm serve deepseek-ai/DeepSeek-V3.2 --tensor-parallel-size 8 --gpu-memory-utilization 0.85 --kv-cache-dtype fp8_e4m3 --block-size 64 --enable-expert-parallel --max_model_len 131072vllm bench serve --model deepseek-ai/DeepSeek-V3.2 --dataset-name random --num-prompts 10 --max-concurrency 4 --input-len 1000 --output-len 100 --random-range-ratio 0Before this version of the fix
After
Edit: Forgot to thank @frida-andersson for the initial digging into the issue and the pioneer work on this. Also, your review would be appreciated!