Skip to content

fix(redact): tighten sensitive-token list to stop matching LLM token-count flags#1006

Merged
matthewkotila merged 1 commit into
mainfrom
mkotila/redact-token-count-flags
May 28, 2026
Merged

fix(redact): tighten sensitive-token list to stop matching LLM token-count flags#1006
matthewkotila merged 1 commit into
mainfrom
mkotila/redact-token-count-flags

Conversation

@matthewkotila

@matthewkotila matthewkotila commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes a bug where every aiperf flag with tokens (plural) in its name had its
value silently replaced with <redacted> in the canonical cli_command string,
breaking reproducibility of the recorded launch command.

Before:
CLI Command: aiperf profile --model 'Qwen/Qwen3-0.6B' ...
--synthetic-input-tokens-mean ''
--synthetic-input-tokens-stddev ''
--output-tokens-mean '' ...

After:
CLI Command: aiperf profile --model 'Qwen/Qwen3-0.6B' ...
--synthetic-input-tokens-mean 1024
--synthetic-input-tokens-stddev 0
--output-tokens-mean 128 ...

Root cause

src/aiperf/common/redact.py's _CLI_COMMAND_SENSITIVE_TOKENS included bare
"token" as a substring marker. Combined with the substring check in
_redact_cli_args (if any(tok in key for tok in ...)), this fired on any flag
containing the substring token — including LLM token-count flags like
--*-tokens-mean, --*-tokens-stddev.

Introduced alongside the v2 config refactor in 94a9102 (#912). Test coverage
at the time exercised --api-key only; the token-count family was uncovered.

Fix

Replaced bare "token" with the specific auth-token compound forms it was
meant to catch: api-token, auth-token, access-token, bearer-token,
id-token, refresh-token (each with - and _ variants). Every entry now
contains a - or _, so substring-matching can't fire on innocent plurals.

Tradeoff: bare --token <secret> is no longer caught. aiperf has no such
flag today; future additions can extend the list explicitly.

Test plan

  • uv run pytest tests/unit/common/test_redact.py — 271 passed
  • Token-count flag family preserves values (new parametrized regression
    cases): --input-tokens-mean, --input-tokens-stddev,
    --output-tokens-mean, --output-tokens-stddev,
    --synthetic-input-tokens-mean, --synthetic-input-tokens-stddev
  • Auth-token compound flags still redact (new parametrized positive cases):
    --api-token, --api_token, --auth-token, --access-token,
    --bearer-token, --id-token, --refresh-token
  • Original bug reproducer: aiperf profile --model 'Qwen/Qwen3-0.6B' ... --synthetic-input-tokens-mean 1024 ...
    shows literal 1024 (not <redacted>) in the printed CLI Command
  • Pre-existing --api-key sk-secret-XXX redaction still works

Related

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced credential redaction to cover additional authentication token flag variants (--api-token, --auth-token, --access-token, --bearer-token, --id-token, --refresh-token), providing improved protection for sensitive information in CLI commands.

Review Change Stack

…count flags

_CLI_COMMAND_SENSITIVE_TOKENS previously included bare "token" as a
substring marker. Combined with the substring-match in
_redact_cli_args (`if any(tok in key for tok in ...)`), this caused
every aiperf flag with "tokens" (plural) in its name -- the entire
LLM token-count family -- to have its value silently replaced with
<redacted> in the canonical cli_command string.

Real-run output showed `--synthetic-input-tokens-mean '<redacted>'`,
`--output-tokens-mean '<redacted>'`, etc., breaking cli_command's
reproducibility purpose: you can't replay a benchmark whose configured
token counts have been scrubbed.

Replace the bare "token" with the specific auth-token compound forms
the redaction was intended to catch (api-token, auth-token,
access-token, bearer-token, id-token, refresh-token, with both - and
_ variants). Each entry now contains at least one - or _, so
substring-matching can't fire on innocent plurals.

Tradeoff: bare --token <secret> is no longer caught. Aiperf has no
such flag today; future additions can extend the list explicitly.

Tests:
- 6 parametrized regression cases for the token-count flag family
  (input/output/synthetic-input variants), asserting numeric values
  pass through unredacted.
- 7 parametrized positive cases for the auth-token compound family,
  asserting their secret values are still redacted.

The buggy substring match was introduced alongside the v2 config
refactor (commit 94a9102 / PR #912). Test coverage at the time
exercised --api-key only; the token-count family was uncovered.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@matthewkotila matthewkotila requested a review from FrankD412 May 27, 2026 21:54
@github-actions github-actions Bot added the fix label May 27, 2026
@github-actions

Copy link
Copy Markdown

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@cf88051c5778c85d60e15542e0451e872db3fd3f

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@cf88051c5778c85d60e15542e0451e872db3fd3f

Last updated for commit: cf88051Browse code

@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d94b4e30-e9e2-47b2-be3b-5ba8e963dcfb

📥 Commits

Reviewing files that changed from the base of the PR and between d797629 and cf88051.

📒 Files selected for processing (2)
  • src/aiperf/common/redact.py
  • tests/unit/common/test_redact.py

Walkthrough

This PR expands the CLI token redaction patterns to recognize additional credential-shaped flag variants and adds regression tests validating both non-matching (plural token flags) and matching (auth-token compound) behaviors.

Changes

Token Redaction Pattern Expansion

Layer / File(s) Summary
Token pattern expansion and validation
src/aiperf/common/redact.py, tests/unit/common/test_redact.py
_CLI_COMMAND_SENSITIVE_TOKENS expands from basic patterns (api-key, authorization, token) to include compound auth/credential variants (api-token, auth-token, access-token, bearer-token, id-token, refresh-token) with clarifying comments on matching constraints. Tests verify plural token-count flags preserve their values while auth-token compound variants are fully redacted.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A token by any other name is still a secret,
So we expanded the patterns to catch each culprit.
Auth-tokens, bearer-tokens, all must be concealed,
While token counts merrily stay revealed!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main change: tightening the sensitive-token list to prevent matching LLM token-count flags, which aligns with the core bug fix in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented May 27, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@matthewkotila matthewkotila merged commit f93b2ad into main May 28, 2026
29 of 30 checks passed
@matthewkotila matthewkotila deleted the mkotila/redact-token-count-flags branch May 28, 2026 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2 participants