[Frontend] Add Streaming Parser Engine and new GLM4.7/GLM5.1/GLM5.2 Parser#45915
Conversation
…arser Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
This looks directionally correct, although I haven't tested it against a live version of these models yet. One note - this isn't wired into
We can relax this if desired, but at least for all these initial ones I added that to ensure that we wire new parsers into the replay harnesses which have a lot of shared parsing tests at various token sizes per delta to fuzz out any bad behavior across token boundaries. Expand the details section immediately below here to see a diff of what I think that would be, but I didn't double-check that this is exactly following the GLM 4.7 format. It looks like the docstring in the parser file though. trace_builder.py changesdiff --git a/tests/parser/engine/trace_builder.py b/tests/parser/engine/trace_builder.py
index 128e511e6..c7a722bc4 100644
--- a/tests/parser/engine/trace_builder.py
+++ b/tests/parser/engine/trace_builder.py
@@ -30,6 +30,7 @@ from vllm.entrypoints.openai.chat_completion.protocol import (
)
from vllm.parser.engine.registered_adapters import (
Gemma4Parser,
+ Glm47MoeParser,
MinimaxM2Parser,
NemotronV3Parser,
Qwen3Parser,
@@ -571,6 +572,75 @@ def _build_nemotron_v3(scenario: Scenario, validate: bool = True) -> Sample:
)
+# ── GLM-4.7 (XML arg_key/arg_value format, starts in REASONING) ────
+
+_GLM47_MOE_VOCAB: dict[str, int] = {
+ "<think>": 50,
+ "</think>": 51,
+ "<tool_call>": 60,
+ "</tool_call>": 61,
+}
+
+
+def _glm47_moe_arg_value(value: Any) -> str:
+ if isinstance(value, str):
+ return value
+ if isinstance(value, bool):
+ return "true" if value else "false"
+ if isinstance(value, (int, float)):
+ return str(value)
+ return json.dumps(value, ensure_ascii=False)
+
+
+def _glm47_moe_tool_segments(tc: ToolCallSpec) -> list[tuple[str, bool]]:
+ segs: list[tuple[str, bool]] = [("<tool_call>", True)]
+ parts = [tc.name]
+ for key, value in tc.arguments.items():
+ val_str = _glm47_moe_arg_value(value)
+ parts.append(
+ f"<arg_key>{key}</arg_key><arg_value>{val_str}</arg_value>"
+ )
+ segs.append(("".join(parts), False))
+ segs.append(("</tool_call>", True))
+ return segs
+
+
+def _glm47_moe_segments(scenario: Scenario) -> list[tuple[str, bool]]:
+ segs: list[tuple[str, bool]] = []
+ if scenario.reasoning is not None:
+ segs.append((scenario.reasoning, False))
+ if scenario.content is not None or scenario.tool_calls:
+ segs.append(("</think>", True))
+ if scenario.content is not None:
+ segs.append((scenario.content, False))
+ if scenario.tool_calls:
+ for tc in scenario.tool_calls:
+ segs.extend(_glm47_moe_tool_segments(tc))
+ return segs
+
+
+def _build_glm47_moe(scenario: Scenario, validate: bool = True) -> Sample:
+ expected_reasoning: str | None
+ if scenario.reasoning is not None:
+ expected_reasoning = scenario.reasoning.rstrip()
+ else:
+ expected_reasoning = ""
+
+ sample = _make_sample(
+ sample_id=f"glm47_moe-{scenario.id}",
+ description=scenario.description,
+ vocab=_GLM47_MOE_VOCAB,
+ segments=_glm47_moe_segments(scenario),
+ expected_reasoning=expected_reasoning,
+ expected_content=_qwen3_expected_content(scenario),
+ expected_tool_calls=_expected_tc(scenario),
+ tools=_expected_tools(scenario),
+ )
+ if validate:
+ _validate_sample(sample, Glm47MoeParser)
+ return sample
+
+
# ── Registry and public API ──────────────────────────────────────────
_BUILDERS: dict[str, Any] = {
@@ -578,6 +648,7 @@ _BUILDERS: dict[str, Any] = {
"gemma4": _build_gemma4,
"minimax_m2": _build_minimax_m2,
"nemotron_v3": _build_nemotron_v3,
+ "glm47_moe": _build_glm47_moe,
}After adding this parser to trace_builder.py, you can confirm it works with |
|
Is it feasible to migrate both glm 45 and 47 in this PR, the two tool parser share 99% of the code. |
…arser Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
| class Glm47MoeModelToolParser(Glm4MoeModelToolParser): | ||
| class Glm47MoeModelToolParser(Glm47MoeParserToolAdapter): # type: ignore[valid-type, misc] | ||
| supports_required_and_named = False | ||
| structural_tag_model = "glm_4_7" |
There was a problem hiding this comment.
I can confirm that the glm_4_7 structural tag is also compatible with the glm_4_5 format.
bbrowning
left a comment
There was a problem hiding this comment.
This looks good to me - your update to the trace_builder.py did a better job of representing the streaming tokens properly for those tests than my quickly put together version. And from what I dug into the differences in the old glm45 vs glm47 tool parser, this looks safe to consolidate under the new version.
Approving this without running myself on a live model, as the tests give good confidence in the parsing behavior here for many cases. With the recent release of GLM 5.2 and its improved MTP, this will substantially clean up that path for streaming usage.
Thanks!
|
✅ Reasoning parser swallows tool tokens (the central bug) Issues: #42400 (GLM-5.1 Claude Code: stop_reason=tool_use but no tool block), #46040 (GLM-5.2 emits <tool_call> inside , XML leaks into reasoning), framing of #46049. Competing PR: #40659. ✅ Streaming tool name truncated / unstable Issues: #39757 (run_in_terminal→run_in, get_weather→get). Competing PRs: #40071 (delay prefix names), #41654 (zero-arg names). ✅ Zero-argument inline tool calls dropped in streaming Issue: #44326. Competing PR: #41654. ✅ Streaming argument-JSON corruption Issue: #40195 (Optional[str] → Smithh", arrays → [...]]). Competing PR: #40197. 🟡 MTP + tool-calling malformed args Issues: #44843 (vanishes when MTP off), args part of #39757. ✅ Responses API format mismatch Issue: #45273 (parser part: AttributeError: 'FunctionTool' object has no attribute 'function'). Competing PRs: #45276, #41631. 🟡 Serving-layer streaming chunk shape Issue: #44098 (continuation chunks re-emit id/type/name; last arg packed into the finish_reason chunk). 🔴 Reasoning token counting Issue / competing PR: #41077. 🔴 Missing classification / GLM-4.5 & SeedOSS regressions Issue / competing PR: #37044. 🔴 Tool-result rendering / content-format (input side) Issue / competing PR: #39630 / #39614. Bottom line PR #45915 is not a point-fix — it's a rewrite that collapses the reasoning+tool split into one grammar, which is exactly the seam that caused ugs.- Solidly fixed: A (tool-in-reasoning — the big one), B (name truncation), C (zero-arg streaming), D (arg-JSON corruption), F (Responses FunctThat covers the bulk of open issues #42400, #46040, #44326, #39757, #40195, and the parser slice of #45273 — and supersedes competing PRs#40659, #40071, #41654, #40197.
|
sfeng33
left a comment
There was a problem hiding this comment.
Thank you for the work!
|
Nice work! |
|
Hi! I'm wondering if the PR / fix is in the |
…arser (vllm-project#45915) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Signed-off-by: divineearthly <divineearthly@gmail.com>
…arser (vllm-project#45915) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…arser (vllm-project#45915) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…arser (vllm-project#45915) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
@frankwang28 It is not, you have to use @sfeng33 Even with this fix, we see weird behavior when It did improve this a lot thought, compared to initial release of GLM-5.x support |
|
@gaby thanks for taking a look! Yeah MTP is a known issue across several models. |
What is the weird behavior you see? Tool call parse or reasoning failures such as content cut off, tags escaping, etc? Errors in requests and vLLM logs about failed FSM advances in the grammar? Or more subtle issues such as the model not calling the right tools or forgetting what tools it has? I just want to confirm which of these (or something else) mtp is triggering so we can reproduce and fix it. |
@bbrowning I don't have an example code to share, but behavior I do. Ex when using claude code in plan mode, the model is not able to call tools to exit plan mode "ExitPlanMode". It would say it called the tool, but doesn't. On vLLM everything returns HTTP 200, I was able to mitigate the issue by passing When using structured format, the model would sometimes return the data with a different format, ex: Expected: |
|
@gaby If you feel up for testing something, try setting |
|
@bbrowning I will give it a try, i'm already using |
I'm having the same issue with GLM 5.2 in Claude Code when entering and exiting Plan Mode. |
|
@gaby we see weird behavior when mtp is enabled for tool calling with output validation. You can use this parameter to resolve the incompatibility between MTP and function calling. |
Thanks, will give it a try. Probably worth updating the official recipe in https://recipes.vllm.ai/ |
Purpose
[Frontend] Add Streaming Parser Engine and new GLM4.7/GLM5.1/GLM5.2 Parser
Test Plan
I tested it locally with both enable_thinking=True and enable_thinking=False, as well as with stream=True and stream=False. In all cases, the output was parsed correctly.
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.