[Benchmark] Auto-detect and correct client/server tokenizer mismatch for random dataset by akii96 · Pull Request #44708 · vllm-project/vllm

akii96 · 2026-06-06T02:35:13Z

Alternative to #42532. Addresses the same problem (input token inflation when bench-side and server-side tokenizers disagree) but takes a different approach based on reviewer feedback there:

Zero changes to dataset code (@DarkLight1337 's main concern)
No new CLI flags
catches any model/tokenizer version mismatch, not just the current DeepSeek-V3.2 case

After get_samples(), probes the server's /tokenize endpoint with the first prompt. If counts match, returns immediately. If not, re-aligns all prompts via /tokenize + /detokenize so server-side token counts are exact.

Verified on MI355X

Model: deepseek-ai/DeepSeek-V3.2
transformers: 5.9.0
Image: vllm/vllm-openai-rocm:nightly-3f0a91bb96f8d72e0498b95c166e817deae14d62
Serve: VLLM_ROCM_USE_AITER=1 VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT4 vllm serve deepseek-ai/DeepSeek-V3.2 --tensor-parallel-size 8 --gpu-memory-utilization 0.85 --kv-cache-dtype fp8_e4m3 --block-size 64 --enable-expert-parallel --max_model_len 131072
Command: vllm bench serve --model deepseek-ai/DeepSeek-V3.2 --dataset-name random --num-prompts 10 --max-concurrency 4 --input-len 1000 --output-len 100 --random-range-ratio 0

Before this version of the fix

============ Serving Benchmark Result ============
Successful requests:                     10        
Benchmark duration (s):                  7.22      
Total input tokens:                      46359     
Total generated tokens:                  1000      
Request throughput (req/s):              1.39      
Output token throughput (tok/s):         138.59    
Total token throughput (tok/s):          6563.52   
Mean TTFT (ms):                          918.93    
Mean TPOT (ms):                          15.99     
==================================================

After

WARNING: tokenizer mismatch (server=6082, expected=1000), re-aligning prompts.
============ Serving Benchmark Result ============
Successful requests:                     10        
Benchmark duration (s):                  5.20      
Total input tokens:                      10000     
Total generated tokens:                  1000      
Request throughput (req/s):              1.92      
Output token throughput (tok/s):         192.47    
Total token throughput (tok/s):          2117.16   
Mean TTFT (ms):                          406.75    
Mean TPOT (ms):                          13.88     
==================================================

Edit: Forgot to thank @frida-andersson for the initial digging into the issue and the pioneer work on this. Also, your review would be appreciated!

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

akii96 · 2026-06-06T03:25:45Z

@AndreasKaratzas @DarkLight1337 @tjtanaa would this be a more acceptable fix for the inflated token issue on DS32 ? (Not sure why but no reviewers on the PR ?)

If not I can close this immediately 😅 . I was just trying to solve this quickly now as other folks are now getting impacted by this when comparing perf across images. But I see this still as a bit of future proofing

DarkLight1337 · 2026-06-06T03:42:05Z

cc @frida-andersson since you are the author of the original PR

AndreasKaratzas · 2026-06-06T03:43:44Z

@AndreasKaratzas @DarkLight1337 @tjtanaa would this be a more acceptable fix for the inflated token issue on DS32 ? (Not sure why but no reviewers on the PR ?)

If not I can close this immediately 😅 . I was just trying to solve this quickly now as other folks are now getting impacted by this when comparing perf across images. But I see this still as a bit of future proofing

Looks good. I'm not the best guy to review this. At the same time I don't completely understand why tokenizers could disagree (hence seems a bit like masking the issue), but again I'm not sure I'm the best guy to review this so it might actually be the way to go here. I checked the other PR very briefly too and there seems not to be a good explanation of why this can happen. I also saw @DarkLight1337 actually commenting the same thing there.

akii96 · 2026-06-06T03:49:57Z

@AndreasKaratzas So the main answer to this

I don't completely understand why tokenizers could disagree

For DeepSeek-V3.2, transformers >= 5.0 doesn't have native support yet (huggingface/transformers#41251), so it is silently falls back to a wrong tokenizer. The server is laoding the right one.

frida-andersson · 2026-06-08T09:04:30Z

Thanks for picking this up @akii96 ! This approach is clean and the "zero changes to dataset code" property is a real win. I only have one minor comment - if /tokenize returns a 503/404 or the endpoint isn't available (non-vLLM backends), except Exception: return input_requests proceeds with wrong counts and no warning. Suggestion: add a print("WARNING: /tokenize unavailable, skipping alignment.") in the except block. Closing my PR

…for random dataset Signed-off-by: Aakif Nawaz <aakif.nawaz@amd.com> Co-authored-by: Frida Andersson <fanderss@amd.com>

akii96 · 2026-06-08T09:21:31Z

Thanks @frida-andersson 🙏

@DarkLight1337 addressed Frida's nit (warning on /tokenize unavailable)
This should be ready for review when you get a chance! (maybe a ready label could be added too)

DarkLight1337

This looks much simpler, thanks!

…for random dataset (vllm-project#44708) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

…for random dataset (vllm-project#44708) Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>

…for random dataset (vllm-project#44708)

…for random dataset (vllm-project#44708) Signed-off-by: divineearthly <divineearthly@gmail.com>

…for random dataset (vllm-project#44708)

mergify Bot added the performance Performance-related issues label Jun 6, 2026

akii96 marked this pull request as ready for review June 6, 2026 03:11

claude Bot reviewed Jun 6, 2026

View reviewed changes

frida-andersson mentioned this pull request Jun 8, 2026

[Benchmark] Add --random-prompt-as-token-ids for the random dataset #42532

Closed

[Benchmark] Auto-detect and correct client/server tokenizer mismatch …

f9ac854

…for random dataset Signed-off-by: Aakif Nawaz <aakif.nawaz@amd.com> Co-authored-by: Frida Andersson <fanderss@amd.com>

akii96 force-pushed the bench-tokenizer-mismatch-guard branch from 816171d to f9ac854 Compare June 8, 2026 09:16

DarkLight1337 approved these changes Jun 8, 2026

View reviewed changes

DarkLight1337 enabled auto-merge (squash) June 8, 2026 10:22

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 8, 2026

Merge branch 'main' into bench-tokenizer-mismatch-guard

db1d32b

DarkLight1337 merged commit ac3409d into vllm-project:main Jun 8, 2026
34 checks passed

waqahmed-amd-fi pushed a commit to waqahmed-amd-fi/vllm that referenced this pull request Jun 10, 2026

[Benchmark] Auto-detect and correct client/server tokenizer mismatch …

b25c999

…for random dataset (vllm-project#44708) Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>

Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026

[Benchmark] Auto-detect and correct client/server tokenizer mismatch …

8b1790e

…for random dataset (vllm-project#44708)

vivek8123 pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Jun 18, 2026

[Benchmark] Auto-detect and correct client/server tokenizer mismatch …

6eadfae

…for random dataset (vllm-project#44708)

divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026

[Benchmark] Auto-detect and correct client/server tokenizer mismatch …

673d7b9

…for random dataset (vllm-project#44708) Signed-off-by: divineearthly <divineearthly@gmail.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026

[Benchmark] Auto-detect and correct client/server tokenizer mismatch …

fc0cf15

…for random dataset (vllm-project#44708)

nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026

[Benchmark] Auto-detect and correct client/server tokenizer mismatch …

d8c2deb

…for random dataset (vllm-project#44708)

ohsono pushed a commit to ohsono/vllm that referenced this pull request Jul 3, 2026

[Benchmark] Auto-detect and correct client/server tokenizer mismatch …

84582ef

…for random dataset (vllm-project#44708)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Benchmark] Auto-detect and correct client/server tokenizer mismatch for random dataset#44708

[Benchmark] Auto-detect and correct client/server tokenizer mismatch for random dataset#44708
DarkLight1337 merged 2 commits into
vllm-project:mainfrom
akii96:bench-tokenizer-mismatch-guard

akii96 commented Jun 6, 2026 •

edited

Loading

claude Bot left a comment

akii96 commented Jun 6, 2026 •

edited

Loading

DarkLight1337 commented Jun 6, 2026

AndreasKaratzas commented Jun 6, 2026

akii96 commented Jun 6, 2026

frida-andersson commented Jun 8, 2026 •

edited

Loading

akii96 commented Jun 8, 2026 •

edited

Loading

DarkLight1337 left a comment

Uh oh!

Labels

4 participants

Uh oh!

Uh oh!

Conversation

akii96 commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verified on MI355X

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

akii96 commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DarkLight1337 commented Jun 6, 2026

AndreasKaratzas commented Jun 6, 2026

akii96 commented Jun 6, 2026

frida-andersson commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

akii96 commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

4 participants

akii96 commented Jun 6, 2026 •

edited

Loading

akii96 commented Jun 6, 2026 •

edited

Loading

frida-andersson commented Jun 8, 2026 •

edited

Loading

akii96 commented Jun 8, 2026 •

edited

Loading