fix(tokenizer): hint transformers upgrade on missing class (#960) by debermudez · Pull Request #971 · ai-dynamo/aiperf

debermudez · 2026-05-21T03:23:58Z

Summary by CodeRabbit

Bug Fixes
- Tokenizer failures are now wrapped into a consistent diagnostic that detects unsupported tokenizer-class errors and shows an “Unsupported Tokenizer Class” panel with an actionable upgrade suggestion when applicable.
Tests
- Added unit tests validating hinting, error-wrapping behavior, and that upgrade hints appear only for the intended transformers major versions.

Example error:

copy-pr-bot · 2026-05-21T03:24:01Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-05-21T03:24:08Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@91fec931a7266ab5df72a48ba4830dd82c735c81

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@91fec931a7266ab5df72a48ba4830dd82c735c81

Last updated for commit: 91fec93 • Browse code

coderabbitai · 2026-05-21T03:27:41Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Detects HuggingFace missing-tokenizer-class errors with regex, optionally suggests a transformers upgrade based on installed version, integrates the hint into Tokenizer.from_pretrained error wrapping, updates tokenizer error display to show “Unsupported Tokenizer Class”, and adds unit tests for detection, display, and wrapping behavior.

Changes

Tokenizer error hints

Layer / File(s)	Summary
Error hint detection helper `src/aiperf/common/tokenizer.py`	Added `re` import and `_missing_tokenizer_class_hint` that matches HuggingFace missing-tokenizer-class error text and returns a `transformers` upgrade suggestion when appropriate.
Display detection for unsupported tokenizer class `src/aiperf/common/tokenizer_display.py`	Added `_MISSING_TOKENIZER_CLASS_RE` and extended `_detect_error` to return a `TokenizerErrorInsight` titled “Unsupported Tokenizer Class” with the missing class and an upgrade-to-`transformers` fix.
Error wrapping integration and tests `src/aiperf/common/tokenizer.py`, `tests/unit/common/test_tokenizer_error_hints.py`, `tests/unit/common/test_tokenizer_display.py`	Updated `Tokenizer.from_pretrained` to compose the original exception type/name and message, append the optional missing-class hint, and raise `TokenizerError`. Added tests covering hint extraction, conditional `transformers>=5` suggestion behavior, display content for the missing-class case, and error-wrapping behavior for matching and unrelated failures.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I sniffed a missing class in the log,
A regex twitch, a tiny jog,
I peeked the version, whispered “upgrade”,
Wrapped the error, left a helpful braid,
Hopped away with a carrot and a nod.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: implementing hints for transformers upgrade when a tokenizer class is missing.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/unit/common/test_tokenizer_error_hints.py (1)
18-18: ⚡ Quick win

Rename test methods to match the required test naming convention.

At Line 18, Line 24, Line 27, Line 45, Line 56, and Line 70, test names should follow test_<function>_<scenario>_<expected> for consistency and discoverability in this repo.

As per coding guidelines, tests/**/*.py: Test naming convention: test_<function>_<scenario>_<expected>.

Also applies to: 24-24, 27-27, 45-45, 56-56, 70-70
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/common/test_tokenizer_error_hints.py` at line 18, Rename the test
methods in this file to follow the repository convention
test_<function>_<scenario>_<expected>; e.g. change
test_matches_hf_missing_class_error to a descriptive three-part name such as
test_matches_hf_missing_class_error_when_model_missing_raises_hint (or similar)
and apply the same pattern to the other test functions referenced in the review
so each name clearly states the function under test, the scenario, and the
expected outcome.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/common/test_tokenizer_error_hints.py`:
- Line 18: Rename the test methods in this file to follow the repository
convention test_<function>_<scenario>_<expected>; e.g. change
test_matches_hf_missing_class_error to a descriptive three-part name such as
test_matches_hf_missing_class_error_when_model_missing_raises_hint (or similar)
and apply the same pattern to the other test functions referenced in the review
so each name clearly states the function under test, the scenario, and the
expected outcome.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 803fbe3b-e68e-45d6-9253-bfaae6264ad3

📥 Commits

Reviewing files that changed from the base of the PR and between a66995b and aed74c9.

📒 Files selected for processing (2)

src/aiperf/common/tokenizer.py
tests/unit/common/test_tokenizer_error_hints.py

codecov · 2026-05-21T03:34:40Z

Codecov Report

❌ Patch coverage is 94.44444% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/aiperf/common/tokenizer.py	88.88%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

🧹 Nitpick comments (1)

tests/unit/common/test_tokenizer_error_hints.py (1)

18-83: ⚡ Quick win

Align test method names with the repo convention.

Several names in Line 18 through Line 83 don’t follow test_<function>_<scenario>_<expected>, which makes intent less uniform across the suite. Please rename these tests to match the required pattern.

As per coding guidelines: tests/**/*.py: “Test naming convention: test_<function>_<scenario>_<expected>”.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/common/test_tokenizer_error_hints.py` around lines 18 - 83, Rename
the test functions to follow test_<function>_<scenario>_<expected> naming: in
this file update _missing_tokenizer_class_hint tests — rename
test_matches_hf_missing_class_error to
test__missing_tokenizer_class_hint_hf_missing_class_error_matches_hint,
test_returns_none_for_unrelated_error to
test__missing_tokenizer_class_hint_unrelated_value_error_returns_none, and
test_returns_none_for_repo_not_found_error to
test__missing_tokenizer_class_hint_repo_not_found_oserror_returns_none; rename
test_v5_suggestion_only_on_v4 to
test__missing_tokenizer_class_hint_transformers_v4_shows_v5_suggestion_else_omits;
and in class TestFromPretrainedWrapsHint rename
test_wraps_missing_class_error_with_hint to
test_Tokenizer_from_pretrained_missing_class_error_includes_hint and
test_wraps_unrelated_error_without_hint to
test_Tokenizer_from_pretrained_unrelated_error_omits_v5_suggestion so names
reference the tested functions/methods ( _missing_tokenizer_class_hint and
Tokenizer.from_pretrained / Tokenizer._load_from_hub ) and follow the repository
convention.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/common/test_tokenizer_error_hints.py`:
- Around line 18-83: Rename the test functions to follow
test_<function>_<scenario>_<expected> naming: in this file update
_missing_tokenizer_class_hint tests — rename test_matches_hf_missing_class_error
to test__missing_tokenizer_class_hint_hf_missing_class_error_matches_hint,
test_returns_none_for_unrelated_error to
test__missing_tokenizer_class_hint_unrelated_value_error_returns_none, and
test_returns_none_for_repo_not_found_error to
test__missing_tokenizer_class_hint_repo_not_found_oserror_returns_none; rename
test_v5_suggestion_only_on_v4 to
test__missing_tokenizer_class_hint_transformers_v4_shows_v5_suggestion_else_omits;
and in class TestFromPretrainedWrapsHint rename
test_wraps_missing_class_error_with_hint to
test_Tokenizer_from_pretrained_missing_class_error_includes_hint and
test_wraps_unrelated_error_without_hint to
test_Tokenizer_from_pretrained_unrelated_error_omits_v5_suggestion so names
reference the tested functions/methods ( _missing_tokenizer_class_hint and
Tokenizer.from_pretrained / Tokenizer._load_from_hub ) and follow the repository
convention.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: caba1294-b9e9-4d95-9f3d-da06032d1ba8

📥 Commits

Reviewing files that changed from the base of the PR and between aed74c9 and f113b44.

📒 Files selected for processing (4)

src/aiperf/common/tokenizer.py
src/aiperf/common/tokenizer_display.py
tests/unit/common/test_tokenizer_display.py
tests/unit/common/test_tokenizer_error_hints.py

lkomali

LGTM!
agree with dynamo-ops comment.

coderabbitai

🧹 Nitpick comments (1)

tests/unit/common/test_tokenizer_display.py (1)

286-295: ⚡ Quick win

Align this parametrized block with repo test style conventions.

Please use from pytest import param and place # fmt: skip on the closing ) line for this @pytest.mark.parametrize block.

Proposed diff

 import pytest
+from pytest import param
@@
     `@pytest.mark.parametrize`(
         ("version", "expect_v5_upgrade"),
         [
-            pytest.param("4.57.3", True, id="v4-suggests-upgrade"),
-            pytest.param("4.56.0", True, id="v4-floor-suggests-upgrade"),
-            pytest.param("5.0.1", False, id="v5-omits-suggestion"),
-            pytest.param("5.8.0", False, id="v5-omits-suggestion-later"),
-            pytest.param("<unknown>", False, id="unknown-omits-suggestion"),
+            param("4.57.3", True, id="v4-suggests-upgrade"),
+            param("4.56.0", True, id="v4-floor-suggests-upgrade"),
+            param("5.0.1", False, id="v5-omits-suggestion"),
+            param("5.8.0", False, id="v5-omits-suggestion-later"),
+            param("<unknown>", False, id="unknown-omits-suggestion"),
         ],
-    )
+    )  # fmt: skip

As per coding guidelines, tests/**/*.py should use from pytest import param and include # fmt: skip on the ) line for @pytest.mark.parametrize.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/common/test_tokenizer_display.py` around lines 286 - 295, Replace
usages of pytest.param in the `@pytest.mark.parametrize` block for ("version",
"expect_v5_upgrade") with param from a top-level import (add "from pytest import
param" to the test module) and change each pytest.param(...) to param(...); also
add the comment "# fmt: skip" on the closing ")" line of that
`@pytest.mark.parametrize` block to match test style conventions. Locate the
decorator that lists versions "4.57.3", "4.56.0", "5.0.1", "5.8.0", "<unknown>"
and update it accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/common/test_tokenizer_display.py`:
- Around line 286-295: Replace usages of pytest.param in the
`@pytest.mark.parametrize` block for ("version", "expect_v5_upgrade") with param
from a top-level import (add "from pytest import param" to the test module) and
change each pytest.param(...) to param(...); also add the comment "# fmt: skip"
on the closing ")" line of that `@pytest.mark.parametrize` block to match test
style conventions. Locate the decorator that lists versions "4.57.3", "4.56.0",
"5.0.1", "5.8.0", "<unknown>" and update it accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0e16cb11-0002-4ede-8685-dc7f8cf346b9

📥 Commits

Reviewing files that changed from the base of the PR and between f113b44 and 4ff7605.

📒 Files selected for processing (4)

src/aiperf/common/tokenizer.py
src/aiperf/common/tokenizer_display.py
tests/unit/common/test_tokenizer_display.py
tests/unit/common/test_tokenizer_error_hints.py

Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>

Mirror the version-aware fallback handling from _missing_tokenizer_class_hint in the CLI panel: read transformers.__version__ with the same (ImportError, AttributeError) fallback to "<unknown>", and only surface the "pip install -U 'transformers>=5'" fix when the installed major version is 4. Surface the detected version in the cause text so users on v5/unknown see why no upgrade hint applies. Signed-off-by: Elias Bermudez <dbermudez@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>

Move the version-aware hint construction out of _detect_error to keep its cyclomatic complexity under the C901 threshold. Behavior is unchanged. Signed-off-by: Elias Bermudez <dbermudez@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>

Replace `import transformers; transformers.__version__` with `importlib.metadata.version("transformers")` in `_missing_tokenizer_class_insight`, falling back to "<unknown>" on `PackageNotFoundError`. This avoids importing transformers from the error-display path and reads the version straight from package metadata. Tests now patch `_pkg_version` instead of `transformers.__version__`. Signed-off-by: Elias Bermudez <dbermudez@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>

coderabbitai

🧹 Nitpick comments (1)

tests/unit/common/test_tokenizer_display.py (1)

286-298: ⚡ Quick win

Align new parametrized test with repo pytest and typing conventions.

Use from pytest import param for the table entries and add explicit parameter types on the new test method.

Proposed patch

@@
-import pytest
+import pytest
+from pytest import param
@@
-    `@pytest.mark.parametrize`(
+    `@pytest.mark.parametrize`(
         ("version", "expect_v5_upgrade"),
         [
-            pytest.param("4.57.3", True, id="v4-suggests-upgrade"),
-            pytest.param("4.56.0", True, id="v4-floor-suggests-upgrade"),
-            pytest.param("5.0.1", False, id="v5-omits-suggestion"),
-            pytest.param("5.8.0", False, id="v5-omits-suggestion-later"),
-            pytest.param("<unknown>", False, id="unknown-omits-suggestion"),
+            param("4.57.3", True, id="v4-suggests-upgrade"),
+            param("4.56.0", True, id="v4-floor-suggests-upgrade"),
+            param("5.0.1", False, id="v5-omits-suggestion"),
+            param("5.8.0", False, id="v5-omits-suggestion-later"),
+            param("<unknown>", False, id="unknown-omits-suggestion"),
         ],
-    )
+    )  # fmt: skip
     def test_missing_tokenizer_class_upgrade_only_on_v4(
-        self, console_output, version, expect_v5_upgrade
-    ):
+        self,
+        console_output: tuple[object, object],
+        version: str,
+        expect_v5_upgrade: bool,
+    ) -> None:

As per coding guidelines: "tests/**/*.py: Use from pytest import param and put # fmt: skip on the ) line for parametrized tests" and "**/*.py: Include type hints on ALL functions (params and return)`."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/common/test_tokenizer_display.py` around lines 286 - 298, The
parametrized test uses pytest.param and lacks type hints and the required
formatter directive; update the decorator table to use param (add from pytest
import param at top) and place "# fmt: skip" on the closing ")" of the
`@pytest.mark.parametrize` block, and add explicit type annotations for all
function parameters and the return type on
test_missing_tokenizer_class_upgrade_only_on_v4 (e.g., annotate console_output,
version, expect_v5_upgrade and -> None) so the test follows the repo
pytest/typing conventions.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/common/test_tokenizer_display.py`:
- Around line 286-298: The parametrized test uses pytest.param and lacks type
hints and the required formatter directive; update the decorator table to use
param (add from pytest import param at top) and place "# fmt: skip" on the
closing ")" of the `@pytest.mark.parametrize` block, and add explicit type
annotations for all function parameters and the return type on
test_missing_tokenizer_class_upgrade_only_on_v4 (e.g., annotate console_output,
version, expect_v5_upgrade and -> None) so the test follows the repo
pytest/typing conventions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3c26e3ca-1b5c-46ba-84e2-ed20473243f4

📥 Commits

Reviewing files that changed from the base of the PR and between 4ff7605 and 91fec93.

📒 Files selected for processing (4)

src/aiperf/common/tokenizer.py
src/aiperf/common/tokenizer_display.py
tests/unit/common/test_tokenizer_display.py
tests/unit/common/test_tokenizer_error_hints.py

github-actions Bot added the fix label May 21, 2026

debermudez force-pushed the dbermudez/aip-907-glm-5-tokenizer-fails-with-transformers-v4-tokenizersbackend branch from a66995b to aed74c9 Compare May 21, 2026 03:26

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

debermudez force-pushed the dbermudez/aip-907-glm-5-tokenizer-fails-with-transformers-v4-tokenizersbackend branch from aed74c9 to f113b44 Compare May 21, 2026 22:26

debermudez marked this pull request as ready for review May 21, 2026 22:27

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

dynamo-ops reviewed May 21, 2026

View reviewed changes

Comment thread src/aiperf/common/tokenizer_display.py Outdated

lkomali reviewed May 22, 2026

View reviewed changes

Comment thread src/aiperf/common/tokenizer_display.py Outdated

debermudez force-pushed the dbermudez/aip-907-glm-5-tokenizer-fails-with-transformers-v4-tokenizersbackend branch from 762ee2b to 19e46cc Compare May 22, 2026 23:20

lkomali approved these changes May 22, 2026

View reviewed changes

dynamo-ops approved these changes May 22, 2026

View reviewed changes

dynamo-ops reviewed May 22, 2026

View reviewed changes

Comment thread src/aiperf/common/tokenizer_display.py Outdated

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

dynamo-ops approved these changes May 23, 2026

View reviewed changes

debermudez and others added 5 commits May 26, 2026 09:52

fix(tokenizer): hint transformers upgrade on missing class (#960)

68d1c18

Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>

fix(tokenizer): show missing-class hint in CLI panel (#960)

94705ea

Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>

debermudez force-pushed the dbermudez/aip-907-glm-5-tokenizer-fails-with-transformers-v4-tokenizersbackend branch from 4ff7605 to 91fec93 Compare May 26, 2026 16:52

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

dynamo-ops approved these changes May 26, 2026

View reviewed changes

debermudez enabled auto-merge (squash) May 26, 2026 17:45

debermudez merged commit 8da0b8f into main May 26, 2026
29 of 30 checks passed

debermudez deleted the dbermudez/aip-907-glm-5-tokenizer-fails-with-transformers-v4-tokenizersbackend branch May 26, 2026 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(tokenizer): hint transformers upgrade on missing class (#960)#971

fix(tokenizer): hint transformers upgrade on missing class (#960)#971
debermudez merged 5 commits into
mainfrom
dbermudez/aip-907-glm-5-tokenizer-fails-with-transformers-v4-tokenizersbackend

debermudez commented May 21, 2026 •

edited by coderabbitai Bot

Loading

copy-pr-bot Bot commented May 21, 2026

github-actions Bot commented May 21, 2026 •

edited

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning)

coderabbitai Bot left a comment

codecov Bot commented May 21, 2026 •

edited

Loading

coderabbitai Bot left a comment

Uh oh!

lkomali left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

coderabbitai Bot left a comment

Uh oh!

Labels

3 participants

Uh oh!

Conversation

debermudez commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

copy-pr-bot Bot commented May 21, 2026

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Try out this PR

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

coderabbitai Bot left a comment

Choose a reason for hiding this comment

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

lkomali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

3 participants

debermudez commented May 21, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading

codecov Bot commented May 21, 2026 •

edited

Loading