Skip to content

[Bugfix] Fix LOGITPROC_SOURCE_ENTRYPOINT test to use spawn-compatible dist-info registration for XPU/ROCm #42040

Merged
tjtanaa merged 5 commits into
vllm-project:mainfrom
dzhengAP:bugfix/fix-entrypoint-spawn-compatible
May 9, 2026
Merged

[Bugfix] Fix LOGITPROC_SOURCE_ENTRYPOINT test to use spawn-compatible dist-info registration for XPU/ROCm #42040
tjtanaa merged 5 commits into
vllm-project:mainfrom
dzhengAP:bugfix/fix-entrypoint-spawn-compatible

Conversation

@dzhengAP

@dzhengAP dzhengAP commented May 8, 2026

Copy link
Copy Markdown
Contributor

Follow-up to #41423, also discussed in #41895.

Problem

test_custom_logitsprocs[LOGITPROC_SOURCE_ENTRYPOINT] and
test_rejects_custom_logitsprocs[LOGITPROC_SOURCE_ENTRYPOINT] relied on
fork-based monkey-patching of importlib.metadata.entry_points to inject a
fake logitproc entrypoint.

That works with VLLM_WORKER_MULTIPROC_METHOD=fork, but it is not compatible
with XPU/ROCm platforms where the tests need to run with spawn-based
multiprocessing. With spawn, the monkey-patched entrypoint state is not
inherited by worker subprocesses, so the fake custom logits processor entrypoint
cannot be discovered.

Fix

Replace the spawn path’s in-memory monkey-patch with a real temporary
.dist-info package written to disk and exposed through PYTHONPATH.

Since importlib.metadata discovers entrypoints from installed package metadata
on disk, spawned subprocesses can discover the fake logitproc entrypoint without
requiring fork.

This PR adds/updates the shared fake-entrypoint setup in
tests/v1/logits_processors/utils.py to:

  1. Create a temporary .dist-info directory with METADATA and
    entry_points.txt.
  2. Add the temporary package directory to PYTHONPATH so spawned subprocesses
    can discover the entrypoint.
  3. Prepend the same directory to sys.path so the current driver process can
    discover the entrypoint as well.
  4. Use spawn-compatible registration when spawn multiprocessing is required.
  5. Keep the existing monkey-patched importlib.metadata.entry_points behavior
    for fork-based test execution.

The follow-up commits also apply this setup consistently across the custom
offline and online logits processor tests.

This makes the custom logits processor entrypoint tests compatible with
spawn-based multiprocessing and fixes the XPU/ROCm CI failures.

… dist-info registration

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added rocm Related to AMD ROCm intel-gpu Related to Intel GPU v1 bug Something isn't working labels May 8, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD May 8, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the custom logits processor tests to support spawned subprocesses by replacing manual monkey-patching of importlib.metadata.entry_points with a disk-based dist-info registration. A new utility function, register_fake_entrypoint, creates a temporary package and updates PYTHONPATH. Feedback indicates that sys.path should also be updated for the current process to ensure the driver process can successfully discover the entry point.

Comment thread tests/v1/logits_processors/utils.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: David Zheng <153074367+dzhengAP@users.noreply.github.com>
@zhenwei-intel

zhenwei-intel commented May 8, 2026

Copy link
Copy Markdown
Contributor

tests/v1/logits_processors/test_custom_online.py
Could you please also handle this test?

@AndreasKaratzas

AndreasKaratzas commented May 8, 2026

Copy link
Copy Markdown
Member

CI is blocked, so I could not wait for the author. Opened a second PR here with their commits as well to honor their contributions:

UPDATE: Author is back and people can officially call me impatient.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
(cherry picked from commit a093d02)

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
(cherry picked from commit 82f2f93)
@dzhengAP

dzhengAP commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

Good sign is the fix focused in this PR is passed in Intel CI. The only fail is LoRA, which has been already discussed here. It can be waived due to current XPU limitation support on LoRA. #41895 (comment)

Deeper insight: So this Qwen3.5 dense model path uses a GDN/Mamba-style layer where LoRA projections are not supported on XPU. The correct fix is to skip this test on XPU, not try to make it pass. @zhenwei-intel @jikunshang

@jikunshang

Copy link
Copy Markdown
Member

Good sign is the fix focused in this PR is passed in Intel CI. The only fail is LoRA, which has been already discussed here. It can be waived due to current XPU limitation support on LoRA. #41895 (comment)

Deeper insight: So this Qwen3.5 dense model path uses a GDN/Mamba-style layer where LoRA projections are not supported on XPU. The correct fix is to skip this test on XPU, not try to make it pass. @zhenwei-intel @jikunshang

we disable some lora case on main, please rebase and check whether it pass.

@AndreasKaratzas

Copy link
Copy Markdown
Member

@dzhengAP could you rebase? I think AMD docker build is having some very temporary issues.

@jikunshang

Copy link
Copy Markdown
Member

rebased. let's see what CI say.

@dzhengAP

dzhengAP commented May 9, 2026

Copy link
Copy Markdown
Contributor Author

Intel CI all passed, but AMD CI still running after 3hours, do we have any experience or estimation of the typical AMD CI ruining time?@AndreasKaratzas and @jikunshang

@AndreasKaratzas

Copy link
Copy Markdown
Member

@dzhengAP Yep, but AMD CI is like that(and it is not blocking), I was only interested in the blocking test group, and it is passing now. I am going to ping people in slack.

@tjtanaa tjtanaa left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tjtanaa tjtanaa merged commit df2636a into vllm-project:main May 9, 2026
17 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD May 9, 2026
weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026
… dist-info registration for XPU/ROCm (vllm-project#42040)

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
Signed-off-by: David Zheng <153074367+dzhengAP@users.noreply.github.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
… dist-info registration for XPU/ROCm (vllm-project#42040)

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
Signed-off-by: David Zheng <153074367+dzhengAP@users.noreply.github.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
… dist-info registration for XPU/ROCm (vllm-project#42040)

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
Signed-off-by: David Zheng <153074367+dzhengAP@users.noreply.github.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
… dist-info registration for XPU/ROCm (vllm-project#42040)

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
Signed-off-by: David Zheng <153074367+dzhengAP@users.noreply.github.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026
… dist-info registration for XPU/ROCm (vllm-project#42040)

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
Signed-off-by: David Zheng <153074367+dzhengAP@users.noreply.github.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
… dist-info registration for XPU/ROCm (vllm-project#42040)

Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
Signed-off-by: David Zheng <153074367+dzhengAP@users.noreply.github.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
MingqiWang-coder added a commit to vLLM-HUST/vllm-hust that referenced this pull request Jun 30, 2026
Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main
(2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner,
worker, attention, KV cache, compilation, and structured output fixes.

Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252
Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726
vllm-project#40727 vllm-project#40737 vllm-project#40749 vllm-project#40961 vllm-project#41119 vllm-project#41133 vllm-project#41233 vllm-project#41237 vllm-project#41411 vllm-project#41496 vllm-project#41549
vllm-project#41674 vllm-project#41873 vllm-project#41895 vllm-project#42040 vllm-project#42112 vllm-project#42289 vllm-project#42479 vllm-project#42585 vllm-project#42692 vllm-project#42706 vllm-project#42709
vllm-project#42739 vllm-project#42967 vllm-project#43001 vllm-project#43079 vllm-project#43125 vllm-project#43160 vllm-project#43616 vllm-project#43669 vllm-project#43719 vllm-project#43768 vllm-project#43808
vllm-project#43961 vllm-project#43982 vllm-project#43988 vllm-project#43998 vllm-project#44057 vllm-project#44560 vllm-project#44574 vllm-project#44568 vllm-project#44603 vllm-project#44744 vllm-project#45195
vllm-project#45345 vllm-project#45383 vllm-project#45487 vllm-project#45564 vllm-project#45673
Runner fix (2): vllm-project#44568 vllm-project#44603

Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU)

Conflict resolutions:
- Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560
- Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195
- Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982

Co-authored-by: GitHub Copilot
Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>
MingqiWang-coder added a commit to vLLM-HUST/vllm-hust that referenced this pull request Jun 30, 2026
Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main
(2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner,
worker, attention, KV cache, compilation, and structured output fixes.

Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252
Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726
vllm-project#40727 vllm-project#40737 vllm-project#40749 vllm-project#40961 vllm-project#41119 vllm-project#41133 vllm-project#41233 vllm-project#41237 vllm-project#41411 vllm-project#41496 vllm-project#41549
vllm-project#41674 vllm-project#41873 vllm-project#41895 vllm-project#42040 vllm-project#42112 vllm-project#42289 vllm-project#42479 vllm-project#42585 vllm-project#42692 vllm-project#42706 vllm-project#42709
vllm-project#42739 vllm-project#42967 vllm-project#43001 vllm-project#43079 vllm-project#43125 vllm-project#43160 vllm-project#43616 vllm-project#43669 vllm-project#43719 vllm-project#43768 vllm-project#43808
vllm-project#43961 vllm-project#43982 vllm-project#43988 vllm-project#43998 vllm-project#44057 vllm-project#44560 vllm-project#44574 vllm-project#44568 vllm-project#44603 vllm-project#44744 vllm-project#45195
vllm-project#45345 vllm-project#45383 vllm-project#45487 vllm-project#45564 vllm-project#45673
Runner fix (2): vllm-project#44568 vllm-project#44603

Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU)

Conflict resolutions:
- Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560
- Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195
- Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982

Co-authored-by: GitHub Copilot
Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>
MingqiWang-coder added a commit to vLLM-HUST/vllm-hust that referenced this pull request Jul 2, 2026
Cherry-pick 62 bugfix/security PRs from upstream vllm-project/vllm main
(2026-05-03 to 2026-06-17), covering scheduler, engine core, model runner,
worker, attention, KV cache, compilation, and structured output fixes.

Security (4): vllm-project#43286 vllm-project#44744 vllm-project#45118 vllm-project#45252
Bugfix (56): vllm-project#35536 vllm-project#36616 vllm-project#38895 vllm-project#39155 vllm-project#39324 vllm-project#39562 vllm-project#39805 vllm-project#40398 vllm-project#40726
Runner fix (2): vllm-project#44568 vllm-project#44603

Skipped: vllm-project#43781 (ROCm-specific, not applicable to Ascend NPU)

Conflict resolutions:
- Manual merge: vllm-project#43286 vllm-project#45118 vllm-project#42112 vllm-project#43160 vllm-project#43719 vllm-project#44560
- Upstream-preferred (-X theirs): vllm-project#43808 vllm-project#43988 vllm-project#42967 vllm-project#35536 vllm-project#45195
- Test files (--theirs): vllm-project#44744 vllm-project#41895 vllm-project#42040 vllm-project#41233 vllm-project#45345 vllm-project#43982

Co-authored-by: GitHub Copilot
Signed-off-by: MingqiWang-coder <mingqiwang@hust.edu.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working intel-gpu Related to Intel GPU ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm v1

5 participants