Skip to content

[Bench] Add BFCL dataset for vllm bench serve tool-calling workloads#42457

Merged
vllm-bot merged 3 commits into
vllm-project:mainfrom
laviier:bfcl_eval
Jun 10, 2026
Merged

[Bench] Add BFCL dataset for vllm bench serve tool-calling workloads#42457
vllm-bot merged 3 commits into
vllm-project:mainfrom
laviier:bfcl_eval

Conversation

@laviier

@laviier laviier commented May 12, 2026

Copy link
Copy Markdown
Contributor

Adds a BFCLDataset that lets vllm bench serve --backend openai-chat replay the Berkeley Function Calling Leaderboard, so users can measure serving latency/throughput on realistic tool-calling traffic. Complements the merged correctness harness in #36560; no code overlap. See the PR description for design details.

AI-assisted: drafted with Claude (Opus 4.7); author reviewed every line.

Purpose

Today there is no standardized way to measure serving latency/throughput on tool-calling workloads. Existing bench datasets (ShareGPT, sonnet, random, HF chat datasets) all produce plain-text turns — they never exercise the tools/tool_choice path, the server-side tool parser, or structured decoding grammars. This PR adds a first-class BFCL dataset for vllm bench serve:

Test Plan

Unit tests + End-to-End smoke tests

# Server
vllm serve openai/gpt-oss-20b --port 8000 \
  --enable-auto-tool-choice --tool-call-parser openai --reasoning-parser openai_gptoss

# Bench
vllm bench serve --model openai/gpt-oss-20b \
  --backend openai-chat --endpoint /v1/chat/completions \
  --dataset-name hf \
  --dataset-path gorilla-llm/Berkeley-Function-Calling-Leaderboard \
  --bfcl-categories simple,live_simple,multiple \
  --num-warmups 5   --temperature 0   --percentile-metrics ttft,tpot,itl,e2el   \
  --max-concurrency 8 --num-prompts 500

Test Result

Unit tests — 7/7 passing:
tests/benchmarks/test_bfcl_dataset.py::test_bfcl_dataset_translates_schema_and_attaches_tools PASSED
tests/benchmarks/test_bfcl_dataset.py::test_bfcl_dataset_requires_openai_chat_backend PASSED
tests/benchmarks/test_bfcl_dataset.py::test_bfcl_dataset_missing_category_raises_clear_error PASSED
tests/benchmarks/test_bfcl_dataset.py::test_chat_backend_uses_messages_field_when_set PASSED
tests/benchmarks/test_bfcl_dataset.py::test_bfcl_prompt_len_includes_tools PASSED
tests/benchmarks/test_bfcl_dataset.py::test_bfcl_prompt_len_falls_back_when_tokenizer_rejects_tools PASSED
tests/benchmarks/test_bfcl_dataset.py::test_bfcl_schema_translation_is_recursive PASSED

End-to-end smoke against openai/gpt-oss-20b:

============ Serving Benchmark Result ============
Successful requests:                     500       
Failed requests:                         0         
Maximum request concurrency:             8         
Benchmark duration (s):                  36.31     
Total input tokens:                      123779    
Total generated tokens:                  52072     
Request throughput (req/s):              13.77     
Output token throughput (tok/s):         1433.99   
Peak output token throughput (tok/s):    498.00    
Peak concurrent requests:                27.00     
Total token throughput (tok/s):          4842.68   
---------------Time to First Token----------------
Mean TTFT (ms):                          78.23     
Median TTFT (ms):                        71.23     
P99 TTFT (ms):                           171.09    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          4.93      
Median TPOT (ms):                        4.75      
P99 TPOT (ms):                           10.18     
---------------Inter-token Latency----------------
Mean ITL (ms):                           18.49     
Median ITL (ms):                         9.21      
P99 ITL (ms):                            146.46    
----------------End-to-end Latency----------------
Mean E2EL (ms):                          571.74    
Median E2EL (ms):                        425.29    
P99 E2EL (ms):                           2252.42   
---------------Speculative Decoding---------------
Acceptance rate (%):                     29.83     
Acceptance length:                       3.09      
Drafts:                                  16716     
Draft tokens:                            117012    
Accepted tokens:                         34905     
Per-position acceptance (%):
  Position 0:                            71.90     
  Position 1:                            47.72     
  Position 2:                            33.15     
  Position 3:                            22.52     
  Position 4:                            16.49     
  Position 5:                            9.95      
  Position 6:                            7.08      
==================================================

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added the performance Performance-related issues label May 12, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the Berkeley Function Calling Leaderboard (BFCL) dataset to the vLLM benchmarking suite. Key changes include the implementation of the BFCLDataset class, which manages data loading from Hugging Face, recursive translation of function schemas to OpenAI tool format, and balanced round-robin sampling across dataset categories. The benchmarking infrastructure was also updated to support pre-built chat messages and per-request overrides in SampleRequest and RequestFuncInput, enabling more accurate simulation of tool-calling scenarios. Feedback was provided regarding the silent suppression of exceptions during chat template application, recommending that errors be logged to facilitate debugging and prevent the masking of underlying tokenizer issues.

Comment thread vllm/benchmarks/datasets/datasets.py Outdated
Comment on lines +3937 to +3938
except Exception:
rendered = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Catching a generic Exception and silently setting rendered = None can hide important errors from tokenizer.apply_chat_template. If an unexpected error occurs, it will be suppressed, and the prompt length will be calculated using a fallback method. This can lead to inaccurate prompt length metrics and mask underlying issues in the tokenizer or chat template. It's better to log the exception to make debugging easier while maintaining robustness.

Suggested change
except Exception:
rendered = None
except Exception as e:
logger.warning("Failed to apply chat template with tools, falling back. Error: %s", e)
rendered = None
@laviier laviier force-pushed the bfcl_eval branch 2 times, most recently from 898136e to 8f3dd43 Compare May 12, 2026 20:53
@mergify

mergify Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @laviier.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify

mergify Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @laviier.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify

mergify Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @laviier.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 29, 2026
@mergify mergify Bot removed the needs-rebase label May 31, 2026
@chaunceyjiang chaunceyjiang added the verified Run pre-commit for new contributors without triggering other tests label Jun 1, 2026

@chaunceyjiang chaunceyjiang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into an issue. Could you help me take a look?
Error 8: Not Found

vllm serve /mnt/data4/models/Qwen/Qwen3.5-27B-FP8 --enable-auto-tool-choice --tool-call-parser qwen3_coder --reasoning-parser qwen3

vllm bench serve --model /mnt/data4/models/Qwen/Qwen3.5-27B-FP8 \
  --backend openai-chat --endpoint /v1/chat/completions \
  --dataset-name hf \
  --dataset-path gorilla-llm/Berkeley-Function-Calling-Leaderboard \
  --bfcl-categories simple,live_simple,multiple \
  --num-warmups 5   --temperature 0   --percentile-metrics ttft,tpot,itl,e2el   \
  --max-concurrency 8 --num-prompts 500
Namespace(subparser='bench', bench_type='serve', dispatch_function=<function BenchmarkServingSubcommand.cmd at 0x7fb9711a1760>, trust_remote_code=False, seed=0, num_prompts=500, dataset_name='hf', no_stream=False, dataset_path='gorilla-llm/Berkeley-Function-Calling-Leaderboard', no_oversample=False, skip_chat_template=False, enable_multimodal_chat=False, disable_shuffle=False, custom_output_len=256, spec_bench_output_len=256, spec_bench_category=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, timed_trace_chunk_hash_size=16, timed_trace_sec_multiplier=1, timed_trace_label_timestamp='timestamp', timed_trace_label_input_length='input_length', timed_trace_label_output_length='output_length', timed_trace_label_hash_ids='hash_ids', blazedit_min_distance=0.0, blazedit_max_distance=1.0, asr_max_audio_len_sec=inf, asr_min_audio_len_sec=0.0, random_input_len=1024, random_output_len=128, random_range_ratio='0.0', random_prefix_len=0, random_batch_size=1, no_reranker=False, random_mm_base_items_per_request=1, random_mm_num_mm_items_range_ratio=0.0, random_mm_limit_mm_per_prompt={'image': 255, 'video': 1}, random_mm_bucket_config={(256, 256, 1): 0.5, (720, 1280, 1): 0.5, (720, 1280, 16): 0.0}, hf_subset=None, hf_split=None, hf_name=None, hf_output_len=None, bfcl_categories=['simple', 'live_simple', 'multiple'], prefix_repetition_prefix_len=256, prefix_repetition_suffix_len=256, prefix_repetition_num_prefixes=10, prefix_repetition_output_len=128, speed_bench_dataset_subset='qualitative', speed_bench_output_len=4096, speed_bench_category=None, label=None, backend='openai-chat', base_url=None, host='127.0.0.1', port=8000, endpoint='/v1/chat/completions', header=None, max_concurrency=8, model='/mnt/data4/models/Qwen/Qwen3.5-27B-FP8', input_len=None, output_len=None, tokenizer=None, tokenizer_mode='auto', use_beam_search=False, logprobs=None, request_rate=inf, burstiness=1.0, disable_tqdm=False, num_warmups=5, profile=False, save_result=False, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, self_timed=None, percentile_metrics='ttft,tpot,itl,e2el', metric_percentiles='99', goodput=None, request_id_prefix='bench-36142bd8-', top_p=None, top_k=None, min_p=None, temperature=0.0, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, served_model_name=None, lora_modules=None, lora_assignment='random', ramp_up_strategy=None, ramp_up_start_rps=None, ramp_up_end_rps=None, ready_check_timeout_sec=0, extra_body=None, skip_tokenizer_init=False, insecure=False, plot_timeline=False, timeline_itl_thresholds='25,50', plot_dataset_stats=False)
Starting initial single prompt test run...
Skipping endpoint ready check.
Warming up with 5 requests...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.37it/s]
Warmup run completed.
Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:03<00:00, 158.35it/s]
Failed requests during benchmark run detected (capping to 10):
Error 0: Not Found
Error 1: Not Found
Error 2: Not Found
Error 3: Not Found
Error 4: Not Found
Error 5: Not Found
Error 6: Not Found
Error 7: Not Found
Error 8: Not Found
Error 9: Not Found
tip: install termplotlib and gnuplot to plot the metrics
============ Serving Benchmark Result ============
Successful requests:                     5         
Failed requests:                         495       
Maximum request concurrency:             8         
Benchmark duration (s):                  3.16      
Total input tokens:                      2841      
Total generated tokens:                  835       
Request throughput (req/s):              1.58      
Output token throughput (tok/s):         264.43    
Peak output token throughput (tok/s):    353.00    
Peak concurrent requests:                5.00      
Total token throughput (tok/s):          1164.14   
---------------Time to First Token----------------
Mean TTFT (ms):                          580.51    
Median TTFT (ms):                        635.43    
P99 TTFT (ms):                           635.93    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          10.81     
Median TPOT (ms):                        10.66     
P99 TPOT (ms):                           11.43     
---------------Inter-token Latency----------------
Mean ITL (ms):                           14.75     
Median ITL (ms):                         10.69     
P99 ITL (ms):                            133.66    
----------------End-to-end Latency----------------
Mean E2EL (ms):                          2386.45   
Median E2EL (ms):                        2376.93   
P99 E2EL (ms):                           3128.01   
==================================================

@laviier

laviier commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

I ran into an issue. Could you help me take a look? Error 8: Not Found

vllm serve /mnt/data4/models/Qwen/Qwen3.5-27B-FP8 --enable-auto-tool-choice --tool-call-parser qwen3_coder --reasoning-parser qwen3
vllm bench serve --model /mnt/data4/models/Qwen/Qwen3.5-27B-FP8 \
  --backend openai-chat --endpoint /v1/chat/completions \
  --dataset-name hf \
  --dataset-path gorilla-llm/Berkeley-Function-Calling-Leaderboard \
  --bfcl-categories simple,live_simple,multiple \
  --num-warmups 5   --temperature 0   --percentile-metrics ttft,tpot,itl,e2el   \
  --max-concurrency 8 --num-prompts 500
Namespace(subparser='bench', bench_type='serve', dispatch_function=<function BenchmarkServingSubcommand.cmd at 0x7fb9711a1760>, trust_remote_code=False, seed=0, num_prompts=500, dataset_name='hf', no_stream=False, dataset_path='gorilla-llm/Berkeley-Function-Calling-Leaderboard', no_oversample=False, skip_chat_template=False, enable_multimodal_chat=False, disable_shuffle=False, custom_output_len=256, spec_bench_output_len=256, spec_bench_category=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, timed_trace_chunk_hash_size=16, timed_trace_sec_multiplier=1, timed_trace_label_timestamp='timestamp', timed_trace_label_input_length='input_length', timed_trace_label_output_length='output_length', timed_trace_label_hash_ids='hash_ids', blazedit_min_distance=0.0, blazedit_max_distance=1.0, asr_max_audio_len_sec=inf, asr_min_audio_len_sec=0.0, random_input_len=1024, random_output_len=128, random_range_ratio='0.0', random_prefix_len=0, random_batch_size=1, no_reranker=False, random_mm_base_items_per_request=1, random_mm_num_mm_items_range_ratio=0.0, random_mm_limit_mm_per_prompt={'image': 255, 'video': 1}, random_mm_bucket_config={(256, 256, 1): 0.5, (720, 1280, 1): 0.5, (720, 1280, 16): 0.0}, hf_subset=None, hf_split=None, hf_name=None, hf_output_len=None, bfcl_categories=['simple', 'live_simple', 'multiple'], prefix_repetition_prefix_len=256, prefix_repetition_suffix_len=256, prefix_repetition_num_prefixes=10, prefix_repetition_output_len=128, speed_bench_dataset_subset='qualitative', speed_bench_output_len=4096, speed_bench_category=None, label=None, backend='openai-chat', base_url=None, host='127.0.0.1', port=8000, endpoint='/v1/chat/completions', header=None, max_concurrency=8, model='/mnt/data4/models/Qwen/Qwen3.5-27B-FP8', input_len=None, output_len=None, tokenizer=None, tokenizer_mode='auto', use_beam_search=False, logprobs=None, request_rate=inf, burstiness=1.0, disable_tqdm=False, num_warmups=5, profile=False, save_result=False, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, self_timed=None, percentile_metrics='ttft,tpot,itl,e2el', metric_percentiles='99', goodput=None, request_id_prefix='bench-36142bd8-', top_p=None, top_k=None, min_p=None, temperature=0.0, frequency_penalty=None, presence_penalty=None, repetition_penalty=None, served_model_name=None, lora_modules=None, lora_assignment='random', ramp_up_strategy=None, ramp_up_start_rps=None, ramp_up_end_rps=None, ready_check_timeout_sec=0, extra_body=None, skip_tokenizer_init=False, insecure=False, plot_timeline=False, timeline_itl_thresholds='25,50', plot_dataset_stats=False)
Starting initial single prompt test run...
Skipping endpoint ready check.
Warming up with 5 requests...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00,  2.37it/s]
Warmup run completed.
Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:03<00:00, 158.35it/s]
Failed requests during benchmark run detected (capping to 10):
Error 0: Not Found
Error 1: Not Found
Error 2: Not Found
Error 3: Not Found
Error 4: Not Found
Error 5: Not Found
Error 6: Not Found
Error 7: Not Found
Error 8: Not Found
Error 9: Not Found
tip: install termplotlib and gnuplot to plot the metrics
============ Serving Benchmark Result ============
Successful requests:                     5         
Failed requests:                         495       
Maximum request concurrency:             8         
Benchmark duration (s):                  3.16      
Total input tokens:                      2841      
Total generated tokens:                  835       
Request throughput (req/s):              1.58      
Output token throughput (tok/s):         264.43    
Peak output token throughput (tok/s):    353.00    
Peak concurrent requests:                5.00      
Total token throughput (tok/s):          1164.14   
---------------Time to First Token----------------
Mean TTFT (ms):                          580.51    
Median TTFT (ms):                        635.43    
P99 TTFT (ms):                           635.93    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          10.81     
Median TPOT (ms):                        10.66     
P99 TPOT (ms):                           11.43     
---------------Inter-token Latency----------------
Mean ITL (ms):                           14.75     
Median ITL (ms):                         10.69     
P99 ITL (ms):                            133.66    
----------------End-to-end Latency----------------
Mean E2EL (ms):                          2386.45   
Median E2EL (ms):                        2376.93   
P99 E2EL (ms):                           3128.01   
==================================================

Thanks for trying it. I can't reproduce locally by running the same bench flags (--bfcl-categories simple,live_simple,multiple --num-warmups 5 --temperature 0 --max-concurrency 8 --num-prompts 500) against a tool-parser-enabled Qwen/Qwen3-0.6B server gives 500/500 successful requests.

"Not Found" here is HTTP 404 from the streaming chat backend, and the possible place vLLM's chat path returns 404 on a per-request basis is OpenAIServingEngine._check_model, which fires "The model X does not exist." when the request's model field doesn't match what the engine has registered. (Source: vllm/entrypoints/openai/engine/serving.py:241.) The fact that 5 of 500 succeed but Successful requests: 5 matches --num-warmups 5 suggests the warmup requests went through fine but the main run is hitting a model-id mismatch.

Could you grab three things so I can confirm?

  1. Server side, while it's running: curl -s http://127.0.0.1:8000/v1/models | jq '.data[].id'
  2. One bare-bones probe with the exact same model string the bench used:
  curl -i http://127.0.0.1:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "/mnt/data4/models/Qwen/Qwen3.5-27B-FP8",
        "messages":[{"role":"user","content":"hi"}],
        "max_completion_tokens": 8}'

If this returns 404, the problem is between server registration and bench --model (likely --served-model-name mismatch on the server, a stale tag, or something similar). If it returns 200, the problem is BFCL-specific and we should dig further.
3. First ~30 lines of the server log at the moment a "Not Found" request arrived (the engine logs every request route + error code).

One quick hypothesis to rule out: was the server started with any --served-model-name, or is that exact path printed by the /v1/models endpoint? If the server ended up registering a different name (canonical Hugging Face id, for example), vllm bench serve --model would 404 every request — except for the 5 warmup ones, which the warmup loop completes silently regardless of failure (asyncio.gather(*warmup_tasks) doesn't check output.success). Worth checking: did warmup actually produce useful output, or is the 5 successes you see actually unrelated to warmup at all?

@chaunceyjiang

Copy link
Copy Markdown
Collaborator

OK, I'll run some more tests today.

@chaunceyjiang chaunceyjiang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Nice work!!

@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 9, 2026
@chaunceyjiang chaunceyjiang enabled auto-merge (squash) June 9, 2026 03:01
auto-merge was automatically disabled June 9, 2026 14:44

Head branch was pushed to by a user without write access

Adds a BFCLDataset that lets `vllm bench serve --backend openai-chat` replay
the Berkeley Function Calling Leaderboard, so users can measure serving
latency/throughput on realistic tool-calling traffic. Complements the merged
correctness harness in vllm-project#36560; no code overlap. See the PR description for
design details.

AI-assisted: drafted with Claude (Opus 4.7); author reviewed every line.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Li Zhang <lzhanga@amazon.com>
@mergify

mergify Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor
@mergify mergify Bot added documentation Improvements or additions to documentation tool-calling labels Jun 9, 2026
@vllm-bot vllm-bot merged commit 89c6a41 into vllm-project:main Jun 10, 2026
45 of 47 checks passed
waqahmed-amd-fi pushed a commit to waqahmed-amd-fi/vllm that referenced this pull request Jun 10, 2026
…llm-project#42457)

Signed-off-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
@laviier laviier deleted the bfcl_eval branch June 11, 2026 15:59
Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026
…llm-project#42457)

Signed-off-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
vivek8123 pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Jun 18, 2026
…llm-project#42457)

Signed-off-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026
…llm-project#42457)

Signed-off-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: divineearthly <divineearthly@gmail.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
…llm-project#42457)

Signed-off-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
…llm-project#42457)

Signed-off-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
ohsono pushed a commit to ohsono/vllm that referenced this pull request Jul 3, 2026
…llm-project#42457)

Signed-off-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Li Zhang <lzhanga@amazon.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed tool-calling verified Run pre-commit for new contributors without triggering other tests

3 participants