[Bugfix][Parser] Fix U+FFFD leak at reasoning-to-content transition in engine parsers#46159
Conversation
Flush the reasoning parser's engine lexer at the reasoning→content transition instead of decoding isolated token IDs, which produces U+FFFD when byte-fallback tokens span delta boundaries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben Browning <bbrownin@redhat.com>
sfeng33
left a comment
There was a problem hiding this comment.
Thank you for the quick fix!
|
The tests I added failed at most but not all chunk sizes before the fix. This makes sense, as whether the bug triggers or not depends on the exact sequence of tokens we got in each delta around the reasoning to tool parser transition. The native I've been running the reproducer from the original issue in a tight loop for a while now and things look good: And, no replacement characters to be found in that stream.txt I'm dumping the output to after the fix. |
I’m curious if you’ve explored this path, which seems to be a clean fix as well (after this PR merged of course). |
I did not explore that yet. From looking at the code, running them unified doesn't even run through the problematic code path so it should avoid this issue. But, I haven't done as much real-world testing of that path since we don't wire it up in the code today by matching compatible tool/reasoning parser args to collapse into the unified one. It is covered by the same set of unit tests as our delegating path and is less complex overall than running them as split delegating parsers. But, I didn't want to swap without putting that path through its paces first. |
|
Ugh these openai server tests look unrelated |
|
Ops the security fix from #45675 made the resolver to use schemathesis 4.x. |
Hmm - looking at that, it's not exactly trivial. The mechanical code changes to adapt to the new APIs are one thing, but the new version also tests lots of things in our schema that now fail but passed under schemathesis 3.x. |
Feel free to tag me if you need any support from the Schemathesis side. There is also a migration guide which could help with migration |
|
Posted on slack to get help to force merge |
|
See #46173 to get the unrelated schema test green on main again. |
|
Stress-testing a GLM 4.7 model I also caught an edge case where a stop token could leak into message content before this fix but does not after this fix. So, the fix will help more than just U+FFFD characters leaking but instead fixes some broader issues with the GLM models and how the tokenization works. The text passed in to our deltas is already tokenized properly, so reusing that as this PR does simplifies handling all the tokenization edge cases. |
|
nice cc @cjackal |
…n engine parsers (vllm-project#46159) Signed-off-by: Ben Browning <bbrownin@redhat.com>
…n engine parsers (vllm-project#46159) Signed-off-by: Ben Browning <bbrownin@redhat.com>
…n engine parsers (vllm-project#46159) Signed-off-by: Ben Browning <bbrownin@redhat.com>
…n engine parsers (vllm-project#46159) Signed-off-by: Ben Browning <bbrownin@redhat.com> Signed-off-by: Qiang Li <qiang.li2@amd.com>
…rving Part 1 of 3 for RFC vllm-project#47161. Adds a streaming mode to the derender endpoints, mirroring the producer's per chunk SSE shape. State is kept client carried and stateless on the server. Responses are returned as plain JSON rather than SSE, since the derender can't buffer a whole generation or hold a connection open. Detokenization rebuilds the decode window from prior token IDs each chunk by replaying it through detokenize_incrementally, rather than a simple rebuild. The multi-byte characters split across chunk boundaries still decode correctly (avoids regressing vllm-project#46159). The chat path's reasoning/tool call parsing is stubbed for now (501) and lands in a follow up PR. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Purpose
Flush the reasoning parser's engine lexer at the reasoning→content transition instead of decoding isolated token IDs, which produces U+FFFD when byte-fallback tokens span delta boundaries.
This fixes a source of unicode replacement characters getting streamed back to clients first identified in #45919 (comment)
Test Plan
Unit Tests
Manual Test with curl
Serve a GLM 4.7 or newer model:
Send it multiple requests in a loop that are likely to trigger a multi-byte unicode character around the reasoning to content transition. I ran this many times in a loop, dumping the output to a file, and then grepped the file for the unicode replacement characters.
Test Result
Unit Tests
Manual Test with curl
Before the fix, I would regularly (maybe 20-40% of the time) see U+FFFD leaked as content around the reasoning to content transition in the output stream.
After the fix, I did not observe it at all.