Skip to content

fix(tokenizer): hint transformers upgrade on missing class (#960)#971

Merged
debermudez merged 5 commits into
mainfrom
dbermudez/aip-907-glm-5-tokenizer-fails-with-transformers-v4-tokenizersbackend
May 26, 2026
Merged

fix(tokenizer): hint transformers upgrade on missing class (#960)#971
debermudez merged 5 commits into
mainfrom
dbermudez/aip-907-glm-5-tokenizer-fails-with-transformers-v4-tokenizersbackend

Conversation

@debermudez

@debermudez debermudez commented May 21, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • Bug Fixes
    • Tokenizer failures are now wrapped into a consistent diagnostic that detects unsupported tokenizer-class errors and shows an “Unsupported Tokenizer Class” panel with an actionable upgrade suggestion when applicable.
  • Tests
    • Added unit tests validating hinting, error-wrapping behavior, and that upgrade hints appear only for the intended transformers major versions.

Review Change Stack

Example error:

Screenshot 2026-05-21 at 14 57 31
@copy-pr-bot

copy-pr-bot Bot commented May 21, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@91fec931a7266ab5df72a48ba4830dd82c735c81

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@91fec931a7266ab5df72a48ba4830dd82c735c81

Last updated for commit: 91fec93Browse code

@github-actions github-actions Bot added the fix label May 21, 2026
@debermudez debermudez force-pushed the dbermudez/aip-907-glm-5-tokenizer-fails-with-transformers-v4-tokenizersbackend branch from a66995b to aed74c9 Compare May 21, 2026 03:26
@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Detects HuggingFace missing-tokenizer-class errors with regex, optionally suggests a transformers upgrade based on installed version, integrates the hint into Tokenizer.from_pretrained error wrapping, updates tokenizer error display to show “Unsupported Tokenizer Class”, and adds unit tests for detection, display, and wrapping behavior.

Changes

Tokenizer error hints

Layer / File(s) Summary
Error hint detection helper
src/aiperf/common/tokenizer.py
Added re import and _missing_tokenizer_class_hint that matches HuggingFace missing-tokenizer-class error text and returns a transformers upgrade suggestion when appropriate.
Display detection for unsupported tokenizer class
src/aiperf/common/tokenizer_display.py
Added _MISSING_TOKENIZER_CLASS_RE and extended _detect_error to return a TokenizerErrorInsight titled “Unsupported Tokenizer Class” with the missing class and an upgrade-to-transformers fix.
Error wrapping integration and tests
src/aiperf/common/tokenizer.py, tests/unit/common/test_tokenizer_error_hints.py, tests/unit/common/test_tokenizer_display.py
Updated Tokenizer.from_pretrained to compose the original exception type/name and message, append the optional missing-class hint, and raise TokenizerError. Added tests covering hint extraction, conditional transformers>=5 suggestion behavior, display content for the missing-class case, and error-wrapping behavior for matching and unrelated failures.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I sniffed a missing class in the log,
A regex twitch, a tiny jog,
I peeked the version, whispered “upgrade”,
Wrapped the error, left a helpful braid,
Hopped away with a carrot and a nod.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing hints for transformers upgrade when a tokenizer class is missing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/common/test_tokenizer_error_hints.py (1)

18-18: ⚡ Quick win

Rename test methods to match the required test naming convention.

At Line 18, Line 24, Line 27, Line 45, Line 56, and Line 70, test names should follow test_<function>_<scenario>_<expected> for consistency and discoverability in this repo.

As per coding guidelines, tests/**/*.py: Test naming convention: test_<function>_<scenario>_<expected>.

Also applies to: 24-24, 27-27, 45-45, 56-56, 70-70

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/common/test_tokenizer_error_hints.py` at line 18, Rename the test
methods in this file to follow the repository convention
test_<function>_<scenario>_<expected>; e.g. change
test_matches_hf_missing_class_error to a descriptive three-part name such as
test_matches_hf_missing_class_error_when_model_missing_raises_hint (or similar)
and apply the same pattern to the other test functions referenced in the review
so each name clearly states the function under test, the scenario, and the
expected outcome.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/common/test_tokenizer_error_hints.py`:
- Line 18: Rename the test methods in this file to follow the repository
convention test_<function>_<scenario>_<expected>; e.g. change
test_matches_hf_missing_class_error to a descriptive three-part name such as
test_matches_hf_missing_class_error_when_model_missing_raises_hint (or similar)
and apply the same pattern to the other test functions referenced in the review
so each name clearly states the function under test, the scenario, and the
expected outcome.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 803fbe3b-e68e-45d6-9253-bfaae6264ad3

📥 Commits

Reviewing files that changed from the base of the PR and between a66995b and aed74c9.

📒 Files selected for processing (2)
  • src/aiperf/common/tokenizer.py
  • tests/unit/common/test_tokenizer_error_hints.py
@codecov

codecov Bot commented May 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.44444% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/aiperf/common/tokenizer.py 88.88% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@debermudez debermudez force-pushed the dbermudez/aip-907-glm-5-tokenizer-fails-with-transformers-v4-tokenizersbackend branch from aed74c9 to f113b44 Compare May 21, 2026 22:26
@debermudez debermudez marked this pull request as ready for review May 21, 2026 22:27

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/common/test_tokenizer_error_hints.py (1)

18-83: ⚡ Quick win

Align test method names with the repo convention.

Several names in Line 18 through Line 83 don’t follow test_<function>_<scenario>_<expected>, which makes intent less uniform across the suite. Please rename these tests to match the required pattern.

As per coding guidelines: tests/**/*.py: “Test naming convention: test_<function>_<scenario>_<expected>”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/common/test_tokenizer_error_hints.py` around lines 18 - 83, Rename
the test functions to follow test_<function>_<scenario>_<expected> naming: in
this file update _missing_tokenizer_class_hint tests — rename
test_matches_hf_missing_class_error to
test__missing_tokenizer_class_hint_hf_missing_class_error_matches_hint,
test_returns_none_for_unrelated_error to
test__missing_tokenizer_class_hint_unrelated_value_error_returns_none, and
test_returns_none_for_repo_not_found_error to
test__missing_tokenizer_class_hint_repo_not_found_oserror_returns_none; rename
test_v5_suggestion_only_on_v4 to
test__missing_tokenizer_class_hint_transformers_v4_shows_v5_suggestion_else_omits;
and in class TestFromPretrainedWrapsHint rename
test_wraps_missing_class_error_with_hint to
test_Tokenizer_from_pretrained_missing_class_error_includes_hint and
test_wraps_unrelated_error_without_hint to
test_Tokenizer_from_pretrained_unrelated_error_omits_v5_suggestion so names
reference the tested functions/methods ( _missing_tokenizer_class_hint and
Tokenizer.from_pretrained / Tokenizer._load_from_hub ) and follow the repository
convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/common/test_tokenizer_error_hints.py`:
- Around line 18-83: Rename the test functions to follow
test_<function>_<scenario>_<expected> naming: in this file update
_missing_tokenizer_class_hint tests — rename test_matches_hf_missing_class_error
to test__missing_tokenizer_class_hint_hf_missing_class_error_matches_hint,
test_returns_none_for_unrelated_error to
test__missing_tokenizer_class_hint_unrelated_value_error_returns_none, and
test_returns_none_for_repo_not_found_error to
test__missing_tokenizer_class_hint_repo_not_found_oserror_returns_none; rename
test_v5_suggestion_only_on_v4 to
test__missing_tokenizer_class_hint_transformers_v4_shows_v5_suggestion_else_omits;
and in class TestFromPretrainedWrapsHint rename
test_wraps_missing_class_error_with_hint to
test_Tokenizer_from_pretrained_missing_class_error_includes_hint and
test_wraps_unrelated_error_without_hint to
test_Tokenizer_from_pretrained_unrelated_error_omits_v5_suggestion so names
reference the tested functions/methods ( _missing_tokenizer_class_hint and
Tokenizer.from_pretrained / Tokenizer._load_from_hub ) and follow the repository
convention.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: caba1294-b9e9-4d95-9f3d-da06032d1ba8

📥 Commits

Reviewing files that changed from the base of the PR and between aed74c9 and f113b44.

📒 Files selected for processing (4)
  • src/aiperf/common/tokenizer.py
  • src/aiperf/common/tokenizer_display.py
  • tests/unit/common/test_tokenizer_display.py
  • tests/unit/common/test_tokenizer_error_hints.py
Comment thread src/aiperf/common/tokenizer_display.py Outdated

@lkomali lkomali left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
agree with dynamo-ops comment.

Comment thread src/aiperf/common/tokenizer_display.py Outdated
@debermudez debermudez force-pushed the dbermudez/aip-907-glm-5-tokenizer-fails-with-transformers-v4-tokenizersbackend branch from 762ee2b to 19e46cc Compare May 22, 2026 23:20
Comment thread src/aiperf/common/tokenizer_display.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/common/test_tokenizer_display.py (1)

286-295: ⚡ Quick win

Align this parametrized block with repo test style conventions.

Please use from pytest import param and place # fmt: skip on the closing ) line for this @pytest.mark.parametrize block.

Proposed diff
 import pytest
+from pytest import param
@@
     `@pytest.mark.parametrize`(
         ("version", "expect_v5_upgrade"),
         [
-            pytest.param("4.57.3", True, id="v4-suggests-upgrade"),
-            pytest.param("4.56.0", True, id="v4-floor-suggests-upgrade"),
-            pytest.param("5.0.1", False, id="v5-omits-suggestion"),
-            pytest.param("5.8.0", False, id="v5-omits-suggestion-later"),
-            pytest.param("<unknown>", False, id="unknown-omits-suggestion"),
+            param("4.57.3", True, id="v4-suggests-upgrade"),
+            param("4.56.0", True, id="v4-floor-suggests-upgrade"),
+            param("5.0.1", False, id="v5-omits-suggestion"),
+            param("5.8.0", False, id="v5-omits-suggestion-later"),
+            param("<unknown>", False, id="unknown-omits-suggestion"),
         ],
-    )
+    )  # fmt: skip

As per coding guidelines, tests/**/*.py should use from pytest import param and include # fmt: skip on the ) line for @pytest.mark.parametrize.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/common/test_tokenizer_display.py` around lines 286 - 295, Replace
usages of pytest.param in the `@pytest.mark.parametrize` block for ("version",
"expect_v5_upgrade") with param from a top-level import (add "from pytest import
param" to the test module) and change each pytest.param(...) to param(...); also
add the comment "# fmt: skip" on the closing ")" line of that
`@pytest.mark.parametrize` block to match test style conventions. Locate the
decorator that lists versions "4.57.3", "4.56.0", "5.0.1", "5.8.0", "<unknown>"
and update it accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/common/test_tokenizer_display.py`:
- Around line 286-295: Replace usages of pytest.param in the
`@pytest.mark.parametrize` block for ("version", "expect_v5_upgrade") with param
from a top-level import (add "from pytest import param" to the test module) and
change each pytest.param(...) to param(...); also add the comment "# fmt: skip"
on the closing ")" line of that `@pytest.mark.parametrize` block to match test
style conventions. Locate the decorator that lists versions "4.57.3", "4.56.0",
"5.0.1", "5.8.0", "<unknown>" and update it accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0e16cb11-0002-4ede-8685-dc7f8cf346b9

📥 Commits

Reviewing files that changed from the base of the PR and between f113b44 and 4ff7605.

📒 Files selected for processing (4)
  • src/aiperf/common/tokenizer.py
  • src/aiperf/common/tokenizer_display.py
  • tests/unit/common/test_tokenizer_display.py
  • tests/unit/common/test_tokenizer_error_hints.py
debermudez and others added 5 commits May 26, 2026 09:52
Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>
Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>
Mirror the version-aware fallback handling from _missing_tokenizer_class_hint
in the CLI panel: read transformers.__version__ with the same
(ImportError, AttributeError) fallback to "<unknown>", and only surface the
"pip install -U 'transformers>=5'" fix when the installed major version is 4.
Surface the detected version in the cause text so users on v5/unknown see why
no upgrade hint applies.

Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>
Move the version-aware hint construction out of _detect_error to keep its
cyclomatic complexity under the C901 threshold. Behavior is unchanged.

Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>
Replace `import transformers; transformers.__version__` with
`importlib.metadata.version("transformers")` in
`_missing_tokenizer_class_insight`, falling back to "<unknown>" on
`PackageNotFoundError`. This avoids importing transformers from the
error-display path and reads the version straight from package metadata.
Tests now patch `_pkg_version` instead of `transformers.__version__`.

Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>
@debermudez debermudez force-pushed the dbermudez/aip-907-glm-5-tokenizer-fails-with-transformers-v4-tokenizersbackend branch from 4ff7605 to 91fec93 Compare May 26, 2026 16:52

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/common/test_tokenizer_display.py (1)

286-298: ⚡ Quick win

Align new parametrized test with repo pytest and typing conventions.

Use from pytest import param for the table entries and add explicit parameter types on the new test method.

Proposed patch
@@
-import pytest
+import pytest
+from pytest import param
@@
-    `@pytest.mark.parametrize`(
+    `@pytest.mark.parametrize`(
         ("version", "expect_v5_upgrade"),
         [
-            pytest.param("4.57.3", True, id="v4-suggests-upgrade"),
-            pytest.param("4.56.0", True, id="v4-floor-suggests-upgrade"),
-            pytest.param("5.0.1", False, id="v5-omits-suggestion"),
-            pytest.param("5.8.0", False, id="v5-omits-suggestion-later"),
-            pytest.param("<unknown>", False, id="unknown-omits-suggestion"),
+            param("4.57.3", True, id="v4-suggests-upgrade"),
+            param("4.56.0", True, id="v4-floor-suggests-upgrade"),
+            param("5.0.1", False, id="v5-omits-suggestion"),
+            param("5.8.0", False, id="v5-omits-suggestion-later"),
+            param("<unknown>", False, id="unknown-omits-suggestion"),
         ],
-    )
+    )  # fmt: skip
     def test_missing_tokenizer_class_upgrade_only_on_v4(
-        self, console_output, version, expect_v5_upgrade
-    ):
+        self,
+        console_output: tuple[object, object],
+        version: str,
+        expect_v5_upgrade: bool,
+    ) -> None:

As per coding guidelines: "tests/**/*.py: Use from pytest import param and put # fmt: skip on the ) line for parametrized tests" and "**/*.py: Include type hints on ALL functions (params and return)`."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/common/test_tokenizer_display.py` around lines 286 - 298, The
parametrized test uses pytest.param and lacks type hints and the required
formatter directive; update the decorator table to use param (add from pytest
import param at top) and place "# fmt: skip" on the closing ")" of the
`@pytest.mark.parametrize` block, and add explicit type annotations for all
function parameters and the return type on
test_missing_tokenizer_class_upgrade_only_on_v4 (e.g., annotate console_output,
version, expect_v5_upgrade and -> None) so the test follows the repo
pytest/typing conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/common/test_tokenizer_display.py`:
- Around line 286-298: The parametrized test uses pytest.param and lacks type
hints and the required formatter directive; update the decorator table to use
param (add from pytest import param at top) and place "# fmt: skip" on the
closing ")" of the `@pytest.mark.parametrize` block, and add explicit type
annotations for all function parameters and the return type on
test_missing_tokenizer_class_upgrade_only_on_v4 (e.g., annotate console_output,
version, expect_v5_upgrade and -> None) so the test follows the repo
pytest/typing conventions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3c26e3ca-1b5c-46ba-84e2-ed20473243f4

📥 Commits

Reviewing files that changed from the base of the PR and between 4ff7605 and 91fec93.

📒 Files selected for processing (4)
  • src/aiperf/common/tokenizer.py
  • src/aiperf/common/tokenizer_display.py
  • tests/unit/common/test_tokenizer_display.py
  • tests/unit/common/test_tokenizer_error_hints.py
@debermudez debermudez enabled auto-merge (squash) May 26, 2026 17:45
@debermudez debermudez merged commit 8da0b8f into main May 26, 2026
29 of 30 checks passed
@debermudez debermudez deleted the dbermudez/aip-907-glm-5-tokenizer-fails-with-transformers-v4-tokenizersbackend branch May 26, 2026 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 participants