Skip to content

[Rust Frontend] Add /tokenize and /detokenize endpoints#44222

Merged
BugenZhao merged 17 commits into
vllm-project:mainfrom
TanNgocDo:tando-feat/rust-tokenize-detokenize
Jun 9, 2026
Merged

[Rust Frontend] Add /tokenize and /detokenize endpoints#44222
BugenZhao merged 17 commits into
vllm-project:mainfrom
TanNgocDo:tando-feat/rust-tokenize-detokenize

Conversation

@TanNgocDo

@TanNgocDo TanNgocDo commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds POST /tokenize and POST /detokenize (being tracked in #44280) to the Rust server, matching the Python OpenAI server's root-path endpoints. Encoding/decoding runs entirely in-process via DynTokenizer — the inference engine is not involved.

  • /tokenize accepts both forms (serde untagged):
    • Completion: encodes the raw prompt string (no chat template); add_special_tokens defaults to true.
    • Chat: renders messages through the chat template, then encodes; add_special_tokens defaults to false (the template adds specials). Reuses convert_message / normalize_generation_prompt_mode from chat_completions/convert so message lowering and the add_generation_prompt / continue_final_message rules stay in lockstep with chat completions.
    • Response carries count, max_model_len, tokens, and (when return_token_strs is set) token_strs.
  • /detokenize decodes token IDs back to text with skip_special_tokens = false, matching Python.
  • Unknown model names return 404 model_not_found; conflicting generation flags and continue_final_message without a trailing assistant message return 400.

Parity with Python

Behavior was checked against serving_tokenization.py / protocol.py: defaults for add_special_tokens (completion true / chat false), add_generation_prompt (true), the tokenize-{base} request-id format, the token_strs-only-when-requested rule, and skip_special_tokens = false on detokenize.

Known gaps vs Python (intentional for this PR): media_io_kwargs / mm_processor_kwargs on the chat form are not modeled yet.

Not a duplicate

Ran the AGENTS.md §1 checks:

gh pr list --repo vllm-project/vllm --state open --search "tokenize rust"
gh pr list --repo vllm-project/vllm --state open --search "rust detokenize"

The only tokenize-related open PR, #36054 ("[Bugfix] Fix tokenize endpoint malformed token_strs"), touches Python only (vllm/entrypoints/serve/tokenize/, tests/entrypoints/openai/). No open PR adds these endpoints to the Rust server — this is net-new functionality in the rust/ tree.

Testing

cargo test -p vllm-server     # 161 passed
cargo test -p vllm-chat       # all passed
cargo clippy -p vllm-server -p vllm-chat --tests   # clean
cargo fmt --check             # clean

Added integration tests under routes/tests.rs covering:

  • completion round-trips through /detokenize
  • add_special_tokens toggles the token IDs
  • return_token_strs returns a parallel token_strs array
  • count / max_model_len are populated
  • chat form: generation prompt increases token count; continue-final vs new-assistant differ
  • /detokenize decodes an explicit token sequence (independent of /tokenize)
  • error paths: conflicting flags → 400, continue-without-assistant → 400, unknown model → 404, empty token list → empty prompt

AI assistance disclosure

AI assistance (Claude Code) was used to:

  • investigate the Python reference logic — tracing serving_tokenization.py / protocol.py to confirm field defaults, request-id format, the token_strs / skip_special_tokens rules, and error semantics, so the Rust endpoints match the existing OpenAI server behavior;
  • generate the unit/integration tests for these endpoints.

The implementation and all test cases were reviewed line-by-line by the submitter, and the test suite + clippy were run locally with the results above.

…server

Adds POST /tokenize and POST /detokenize to the Rust server, matching the
Python OpenAI server's root-path endpoints. Encoding/decoding runs entirely
in-process via DynTokenizer; the inference engine is not involved.

- /tokenize accepts both the completion form (raw `prompt`,
  add_special_tokens defaults true) and the chat form (renders `messages`
  through the chat template, then encodes, add_special_tokens defaults false).
  The chat path reuses convert_message / normalize_generation_prompt_mode from
  chat_completions/convert so message lowering and the add_generation_prompt /
  continue_final_message rules stay in lockstep with chat completions.
- Response carries count, max_model_len, tokens, and (when return_token_strs
  is set) token_strs.
- /detokenize decodes token ids with skip_special_tokens=false, matching
  Python.
- Unknown model -> 404; conflicting generation flags and continue_final_message
  without a trailing assistant message -> 400.

Tested: cargo test -p vllm-server (161 passed), cargo test -p vllm-chat
(all passed), cargo clippy/fmt clean.

AI assistance (Claude Code) was used to investigate the Python reference logic
(serving_tokenization.py / protocol.py) for behavior parity and to generate the
unit/integration tests. The implementation and tests were reviewed line-by-line
by the submitter.

Co-authored-by: Claude
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Tan Ngoc Do <darkknightkhtn2008@gmail.com>
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

Agent Guidelines

IMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban.

🚀

@mergify mergify Bot added the rust label Jun 1, 2026
@chatgpt-codex-connector

Copy link
Copy Markdown
@BugenZhao

Copy link
Copy Markdown
Member

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e918a149b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread rust/src/server/src/routes/openai/tokenize/types.rs Outdated
Comment thread rust/src/server/src/routes/tokenize/types.rs

@BugenZhao BugenZhao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment thread rust/src/server/src/routes/openai/tokenize.rs Outdated
Comment thread rust/src/server/src/routes/openai/tokenize/types.rs Outdated
Comment thread rust/src/server/src/routes/openai/chat_completions/convert.rs Outdated
Comment thread rust/src/server/src/routes/openai/chat_completions/convert.rs Outdated
Comment thread rust/src/server/src/routes/tokenize/types.rs
TanNgocDo and others added 4 commits June 5, 2026 11:25
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: TanNgocDo <darkknightkhtn2008@gmail.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: TanNgocDo <darkknightkhtn2008@gmail.com>
Signed-off-by: Tan Ngoc Do <darkknightkhtn2008@gmail.com>
Comment thread rust/src/server/src/routes/tokenize/types.rs
TanNgocDo added 3 commits June 5, 2026 16:36
Signed-off-by: Tan Ngoc Do <darkknightkhtn2008@gmail.com>
Signed-off-by: Tan Ngoc Do <darkknightkhtn2008@gmail.com>
@chatgpt-codex-connector

Copy link
Copy Markdown
@TanNgocDo TanNgocDo requested a review from BugenZhao June 7, 2026 09:35
Comment thread rust/src/chat/src/lib.rs Outdated
Comment thread rust/src/server/src/routes/openai/tokenize/types.rs Outdated
Comment thread rust/src/server/src/routes/openai/utils/types.rs
Comment thread rust/src/server/src/routes.rs Outdated
Comment thread rust/src/chat/src/lib.rs

@BugenZhao BugenZhao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM. Thanks!

Comment thread rust/src/server/src/routes.rs Outdated
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@BugenZhao BugenZhao added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 9, 2026
@BugenZhao BugenZhao changed the title [Frontend][Rust] Add /tokenize and /detokenize endpoints to the Rust server Jun 9, 2026
@BugenZhao BugenZhao enabled auto-merge (squash) June 9, 2026 09:17
@BugenZhao BugenZhao merged commit 69fdaff into vllm-project:main Jun 9, 2026
21 checks passed
ekagra-ranjan pushed a commit to ekagra-ranjan/vllm that referenced this pull request Jun 9, 2026
…#44222)

Signed-off-by: Tan Ngoc Do <darkknightkhtn2008@gmail.com>
Signed-off-by: TanNgocDo <darkknightkhtn2008@gmail.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
waqahmed-amd-fi pushed a commit to waqahmed-amd-fi/vllm that referenced this pull request Jun 10, 2026
…#44222)

Signed-off-by: Tan Ngoc Do <darkknightkhtn2008@gmail.com>
Signed-off-by: TanNgocDo <darkknightkhtn2008@gmail.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
Saddss pushed a commit to Saddss/vllm that referenced this pull request Jun 14, 2026
…#44222)

Signed-off-by: Tan Ngoc Do <darkknightkhtn2008@gmail.com>
Signed-off-by: TanNgocDo <darkknightkhtn2008@gmail.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
vivek8123 pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Jun 18, 2026
…#44222)

Signed-off-by: Tan Ngoc Do <darkknightkhtn2008@gmail.com>
Signed-off-by: TanNgocDo <darkknightkhtn2008@gmail.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
divineearthly pushed a commit to divineearthly/vllm that referenced this pull request Jun 19, 2026
…#44222)

Signed-off-by: Tan Ngoc Do <darkknightkhtn2008@gmail.com>
Signed-off-by: TanNgocDo <darkknightkhtn2008@gmail.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: divineearthly <divineearthly@gmail.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Jun 22, 2026
…#44222)

Signed-off-by: Tan Ngoc Do <darkknightkhtn2008@gmail.com>
Signed-off-by: TanNgocDo <darkknightkhtn2008@gmail.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
nkzhenhua pushed a commit to nkzhenhua/vllm that referenced this pull request Jun 24, 2026
…#44222)

Signed-off-by: Tan Ngoc Do <darkknightkhtn2008@gmail.com>
Signed-off-by: TanNgocDo <darkknightkhtn2008@gmail.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
ohsono pushed a commit to ohsono/vllm that referenced this pull request Jul 3, 2026
…#44222)

Signed-off-by: Tan Ngoc Do <darkknightkhtn2008@gmail.com>
Signed-off-by: TanNgocDo <darkknightkhtn2008@gmail.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed rust

4 participants