fix(redact): tighten sensitive-token list to stop matching LLM token-count flags by matthewkotila · Pull Request #1006 · ai-dynamo/aiperf

matthewkotila · 2026-05-27T21:54:56Z

Summary

Fixes a bug where every aiperf flag with tokens (plural) in its name had its
value silently replaced with <redacted> in the canonical cli_command string,
breaking reproducibility of the recorded launch command.

Before:
CLI Command: aiperf profile --model 'Qwen/Qwen3-0.6B' ...
--synthetic-input-tokens-mean ''
--synthetic-input-tokens-stddev ''
--output-tokens-mean '' ...

After:
CLI Command: aiperf profile --model 'Qwen/Qwen3-0.6B' ...
--synthetic-input-tokens-mean 1024
--synthetic-input-tokens-stddev 0
--output-tokens-mean 128 ...

Root cause

src/aiperf/common/redact.py's _CLI_COMMAND_SENSITIVE_TOKENS included bare
"token" as a substring marker. Combined with the substring check in
_redact_cli_args (if any(tok in key for tok in ...)), this fired on any flag
containing the substring token — including LLM token-count flags like
--*-tokens-mean, --*-tokens-stddev.

Introduced alongside the v2 config refactor in 94a9102 (#912). Test coverage
at the time exercised --api-key only; the token-count family was uncovered.

Fix

Replaced bare "token" with the specific auth-token compound forms it was
meant to catch: api-token, auth-token, access-token, bearer-token,
id-token, refresh-token (each with - and _ variants). Every entry now
contains a - or _, so substring-matching can't fire on innocent plurals.

Tradeoff: bare --token <secret> is no longer caught. aiperf has no such
flag today; future additions can extend the list explicitly.

Test plan

uv run pytest tests/unit/common/test_redact.py — 271 passed
Token-count flag family preserves values (new parametrized regression
cases): --input-tokens-mean, --input-tokens-stddev,
--output-tokens-mean, --output-tokens-stddev,
--synthetic-input-tokens-mean, --synthetic-input-tokens-stddev
Auth-token compound flags still redact (new parametrized positive cases):
--api-token, --api_token, --auth-token, --access-token,
--bearer-token, --id-token, --refresh-token
Original bug reproducer: aiperf profile --model 'Qwen/Qwen3-0.6B' ... --synthetic-input-tokens-mean 1024 ...
shows literal 1024 (not <redacted>) in the printed CLI Command
Pre-existing --api-key sk-secret-XXX redaction still works

Summary by CodeRabbit

Bug Fixes
- Enhanced credential redaction to cover additional authentication token flag variants (--api-token, --auth-token, --access-token, --bearer-token, --id-token, --refresh-token), providing improved protection for sensitive information in CLI commands.

…count flags _CLI_COMMAND_SENSITIVE_TOKENS previously included bare "token" as a substring marker. Combined with the substring-match in _redact_cli_args (`if any(tok in key for tok in ...)`), this caused every aiperf flag with "tokens" (plural) in its name -- the entire LLM token-count family -- to have its value silently replaced with <redacted> in the canonical cli_command string. Real-run output showed `--synthetic-input-tokens-mean '<redacted>'`, `--output-tokens-mean '<redacted>'`, etc., breaking cli_command's reproducibility purpose: you can't replay a benchmark whose configured token counts have been scrubbed. Replace the bare "token" with the specific auth-token compound forms the redaction was intended to catch (api-token, auth-token, access-token, bearer-token, id-token, refresh-token, with both - and _ variants). Each entry now contains at least one - or _, so substring-matching can't fire on innocent plurals. Tradeoff: bare --token <secret> is no longer caught. Aiperf has no such flag today; future additions can extend the list explicitly. Tests: - 6 parametrized regression cases for the token-count flag family (input/output/synthetic-input variants), asserting numeric values pass through unredacted. - 7 parametrized positive cases for the auth-token compound family, asserting their secret values are still redacted. The buggy substring match was introduced alongside the v2 config refactor (commit 94a9102 / PR #912). Test coverage at the time exercised --api-key only; the token-count family was uncovered. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-27T21:55:07Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@cf88051c5778c85d60e15542e0451e872db3fd3f

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@cf88051c5778c85d60e15542e0451e872db3fd3f

Last updated for commit: cf88051 • Browse code

coderabbitai · 2026-05-27T21:58:17Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d94b4e30-e9e2-47b2-be3b-5ba8e963dcfb

📥 Commits

Reviewing files that changed from the base of the PR and between d797629 and cf88051.

📒 Files selected for processing (2)

src/aiperf/common/redact.py
tests/unit/common/test_redact.py

Walkthrough

This PR expands the CLI token redaction patterns to recognize additional credential-shaped flag variants and adds regression tests validating both non-matching (plural token flags) and matching (auth-token compound) behaviors.

Changes

Token Redaction Pattern Expansion

Layer / File(s)	Summary
Token pattern expansion and validation `src/aiperf/common/redact.py`, `tests/unit/common/test_redact.py`	`_CLI_COMMAND_SENSITIVE_TOKENS` expands from basic patterns (api-key, authorization, token) to include compound auth/credential variants (api-token, auth-token, access-token, bearer-token, id-token, refresh-token) with clarifying comments on matching constraints. Tests verify plural token-count flags preserve their values while auth-token compound variants are fully redacted.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A token by any other name is still a secret,
So we expanded the patterns to catch each culprit.
Auth-tokens, bearer-tokens, all must be concealed,
While token counts merrily stay revealed! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly describes the main change: tightening the sensitive-token list to prevent matching LLM token-count flags, which aligns with the core bug fix in the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-27T22:05:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

matthewkotila requested a review from FrankD412 May 27, 2026 21:54

github-actions Bot added the fix label May 27, 2026

dynamo-ops approved these changes May 27, 2026

View reviewed changes

matthewkotila merged commit f93b2ad into main May 28, 2026
29 of 30 checks passed

matthewkotila deleted the mkotila/redact-token-count-flags branch May 28, 2026 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(redact): tighten sensitive-token list to stop matching LLM token-count flags#1006

fix(redact): tighten sensitive-token list to stop matching LLM token-count flags#1006
matthewkotila merged 1 commit into
mainfrom
mkotila/redact-token-count-flags

matthewkotila commented May 27, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 27, 2026

coderabbitai Bot commented May 27, 2026

codecov Bot commented May 27, 2026

Uh oh!

Labels

2 participants

Uh oh!

Conversation

matthewkotila commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Test plan

Related

Summary by CodeRabbit

github-actions Bot commented May 27, 2026

Try out this PR

coderabbitai Bot commented May 27, 2026

Walkthrough

Changes

Estimated code review effort

Poem

codecov Bot commented May 27, 2026

Codecov Report

Uh oh!

Labels

2 participants

matthewkotila commented May 27, 2026 •

edited by coderabbitai Bot

Loading