Skip to content

feat(mock-server): add --record-requests for per-request ISL/OSL capture#962

Merged
ajcasagrande merged 44 commits into
mainfrom
fdinatale/mock-server-request-recorder
May 23, 2026
Merged

feat(mock-server): add --record-requests for per-request ISL/OSL capture#962
ajcasagrande merged 44 commits into
mainfrom
fdinatale/mock-server-request-recorder

Conversation

@FrankD412

@FrankD412 FrankD412 commented May 19, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds an opt-in --record-requests PATH mode to the mock server that tokenizes each incoming request inline (using the configured --tokenizer) and writes one JSONL record per request, plus a per-endpoint distribution summary on shutdown (<PATH>.summary.json and stdout).
  • Each record captures the full OSL fingerprint: raw max_tokens, max_completion_tokens, min_tokens, ignore_eos, reasoning_effort — not just the resolved requested_osl.
  • Per-endpoint summary stats include min/max/mean/stdev/p50/p90/p95/p99 for isl, requested_osl, and min_tokens, plus unique_values and an equal-width histogram (num_bins = max(10, ceil((max-min)/100))) on isl and requested_osl. Stdout rendering uses a 20-char horizontal bar chart.
  • Configuration safety: --record-requests forces --workers=1 (in-process state needs a single producer) and rejects --no-tokenizer via the pydantic validator (recording with no tokenizer is incoherent).

Purpose: lets you verify what ISL / requested-OSL distribution actually hits the wire during an aiperf run — useful for validating --random-range-ratio, --strict-isl, and other client-side ISL/OSL synthesis features against ground truth at the server.

Design notes (spec, plan):

  • In-process synchronous in make_ctx; reuses the corpus tokenizer rather than introducing a separate one.
  • Reuses the existing _extract_request_content helper for text + resolved OSL; adds _extract_osl_fingerprint for the raw fields.

Test plan

  • Unit tests pass: uv run pytest tests/unit/aiperf_mock_server -v (19 tests)
  • Integration tests pass: uv run pytest tests/integration/test_mock_server_record_requests.py -m integration -v (3 tests)
  • Validator rejects bad config: MockServerConfig(record_requests="/tmp/x", no_tokenizer=True) raises ValidationError with message containing "requires a tokenizer"
  • End-to-end smoke: start the mock server with recording and a small aiperf run
    aiperf-mock-server --record-requests /tmp/req.jsonl --fast --tokenizer Qwen/Qwen3-0.6B &
    aiperf profile --endpoint-type chat --url http://localhost:8000 \
        --model Qwen/Qwen3-0.6B --random-range-ratio 0.2 \
        --isl-mean 1024 --osl-mean 256 --request-count 200
    kill %1
    cat /tmp/req.jsonl.summary.json   # per-endpoint summary including histogram
    python -c "import pandas as pd; print(pd.read_json('/tmp/req.jsonl', lines=True)[['isl','requested_osl']].describe())"
  • Stdout histogram lines render correctly when the mock server shuts down (header + bin rows + bars).
  • Confirm --no-tokenizer + --record-requests errors at config parse time, not inside a request handler.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Request-recording mode for the mock server: per-request prompt-token and output-limit capture, JSONL per-request output, and shutdown summary with per-endpoint quantiles/histograms and optional vocab distribution
    • Completion requests now accept token-ID-based prompts
  • Documentation

    • Documented request-recording usage, constraints (requires tokenizer, incompatible with no-tokenizer, forces single-worker), output formats, and an end-to-end example
  • Tests

    • Added integration and unit tests covering recording, summaries, and token-ID prompt handling

Review Change Stack

@copy-pr-bot

copy-pr-bot Bot commented May 19, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

github-actions Bot commented May 19, 2026

Copy link
Copy Markdown

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@0712fbd6f10c87c1574b28c26cdd6498c8a7a7ed

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@0712fbd6f10c87c1574b28c26cdd6498c8a7a7ed

Last updated for commit: 0712fbdBrowse code

@github-actions github-actions Bot added the feat label May 19, 2026
@FrankD412 FrankD412 force-pushed the fdinatale/mock-server-request-recorder branch from ab21812 to b539267 Compare May 19, 2026 16:31
@github-actions

github-actions Bot commented May 19, 2026

Copy link
Copy Markdown
@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This pull request adds per-request recording to the mock server: it captures ISL and requested-OSL per request, supports token-id prompts, accumulates per-endpoint samples and token-frequency counts, emits JSONL per request, and writes/prints per-endpoint quantiles, equal-width histograms, and optional vocab-distribution summaries on shutdown.

Changes

Request Recording Stats & Vocabulary Distribution

Layer / File(s) Summary
Token-ID and CompletionPrompt Support
tests/aiperf_mock_server/models.py, tests/aiperf_mock_server/tokens.py
CompletionPrompt type alias accepts token-id lists (flat/nested), flatten_completion_prompt_token_ids() normalizes them, CompletionRequest.prompt_text converts token-id prompts to space-separated strings, and _extract_osl_fingerprint() normalizes OSL-related fields across request types.
Core RequestRecorder Implementation
tests/aiperf_mock_server/request_recorder.py
RequestRecorder with tokenizer loading, per-endpoint accumulators, prompt-token extraction strategies (prompt-token IDs, chat-template with fallback), JSONL per-request records, equal-width histogram generation and ASCII rendering, vocab-distribution computation (top decoded tokens, coverage, concentration, entropy, 80-bucket shape), summary JSON and stdout report generation, and global recorder accessors.
Request-Level Recording Integration
tests/aiperf_mock_server/utils.py
make_ctx generates request_id, conditionally calls _maybe_record_request to persist request details (timestamp, endpoint, model, stream flag, extracted text/osl fingerprint), and returns RequestCtx with the generated request_id.
Configuration & Validation
tests/aiperf_mock_server/config.py
Adds record_requests CLI option/field; moves logger init; apply_flags() validates that recording requires a tokenizer and forces workers=1 with a warning when needed.
Server Lifecycle Wiring
tests/aiperf_mock_server/app.py, tests/integration/conftest.py
FastAPI lifespan conditionally opens and registers a global RequestRecorder at startup and reliably closes/unregisters it on shutdown; test fixture now forwards no_tokenizer from kwargs instead of hardcoding it.
Integration Tests: record_requests
tests/integration/test_mock_server_record_requests.py
End-to-end test exercising recording mode (chat/completions/embeddings, legacy and modern fields), asserts JSONL and shutdown summary outputs, validates per-endpoint fields and summary histograms/unique-values, and checks config validations (workers collapse, tokenizer requirement).
Unit Tests: Histogram & Vocab
tests/unit/aiperf_mock_server/test_request_recorder.py
Extensive unit tests for histogram helpers, summary building/rendering, vocab-distribution calculations and rendering, recorder token-id tracking, compute_shape_80 behavior, and JSONL inspection using tokenizer doubles.
API/Model Compliance Tests
tests/unit/server/test_api_compliance.py, tests/unit/server/test_models.py, tests/unit/server/test_tokens.py
Tests for /v1/completions with token-id prompts, CompletionRequest handling of token-id prompts, and tokenize_request() prompt-token counting for token-id prompt shapes.
README Documentation
tests/aiperf_mock_server/README.md
Adds Request Recording feature docs: feature list entry, tokenizer/flag notes, --record-requests options and constraints, JSONL-per-request schema, shutdown-summary format, and example run.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I nibble tokens, count each plea,
ISL and OSL recorded with glee.
Histograms hop, vocab sparks bright,
JSONL lines through the quiet night.
A mock server whispered, "Now we see!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main feature addition: an opt-in --record-requests flag for capturing per-request input sequence length (ISL) and output sequence length (OSL) metrics in the mock server.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
tests/unit/aiperf_mock_server/test_request_recorder.py (1)

17-31: ⚡ Quick win

Align test names and signatures with repo test conventions.

Several tests don’t follow the required test_<function>_<scenario>_<expected> pattern, and some test parameters are untyped (e.g., values, expected, capsys). Please normalize names/signatures to the enforced style for consistency.

As per coding guidelines: "tests/**/*.py: Test naming convention: test_<function>_<scenario>_<expected>" and "**/*.py: Include type hints on ALL functions (params and return)".

Also applies to: 85-149, 151-260

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/aiperf_mock_server/test_request_recorder.py` around lines 17 - 31,
Test function and parameters in TestHistogram violate naming/type conventions:
rename test_degenerate_inputs to follow
test__histogram_degenerate_inputs_returns_expected and add full type hints to
the signature (e.g., values: Sequence[float] and expected: Optional[Mapping[str,
Any]]) and the return type -> None; update the pytest.mark.parametrize parameter
names if necessary to match the typed signature and keep the same param cases
(single_value, all_equal, empty_returns_none); locate the _histogram usage to
ensure the test name references that function and apply the same
renaming+type-hint pattern to the other affected tests referenced (lines
85-149,151-260) so all tests follow test_<function>_<scenario>_<expected> and
include parameter and return type annotations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/superpowers/plans/2026-05-18-recorder-stats-extension.md`:
- Around line 726-741: The fenced stdout sample in the markdown lacks a language
specifier; update the opening fence from ``` to ```text in the block containing
the "Request distribution (100 requests)" sample so the code fence is explicitly
labeled as text (this satisfies markdownlint MD040).

In `@docs/superpowers/specs/2026-05-18-recorder-stats-extension-design.md`:
- Around line 28-31: Update the Markdown fenced code blocks that are missing a
language tag by adding the `text` fence language; specifically, locate the block
containing the formulas starting with "num_bins = max(10, ceil((max - min) /
100))" and the output/example block starting with "/v1/completions  n=100" (and
the other similar blocks around lines 71-83) and change their opening fences
from ``` to ```text so markdownlint MD040 is satisfied.
- Around line 1-6: Add the new docs file to the site index: open docs/index.yml
and insert an entry for
"superpowers/specs/2026-05-18-recorder-stats-extension-design.md" (the exact
filename from the diff) under the appropriate "superpowers" or "specs" section
following the existing entry format/indentation used in the file so the Fern
site will include the new spec; ensure the YAML key order and indentation match
other entries and commit the updated docs/index.yml.

In `@tests/aiperf_mock_server/README.md`:
- Around line 483-504: The fenced code block that begins with "Request
distribution (5000 requests)" is missing a language specifier; update that
triple-backtick fence to include "text" (i.e., change ``` to ```text) so the
example output block is explicitly marked as plain text for proper Markdown
rendering and tooling.
- Around line 548-560: Update the fenced code block in README.md that displays
the tests/aiperf_mock_server/ tree to include a language specifier by changing
the opening fence from ``` to ```text so the block becomes ```text followed by
the directory listing (keep the closing ``` unchanged); locate the code block in
the README where the project structure is shown and modify that fence, e.g., the
block that begins with the line showing "tests/aiperf_mock_server/" inside the
README.md.

In `@tests/aiperf_mock_server/request_recorder.py`:
- Around line 69-77: The record() and open()/close() methods perform blocking
tokenizer and file I/O (Tokenizer.from_pretrained, tokenizer.encode,
self._file.write, and Path.write_bytes) on the FastAPI event loop; move these
blocking calls off the loop by wrapping them in asyncio.to_thread or scheduling
them on a background thread/worker. Specifically: in open() call
Tokenizer.from_pretrained and open the file via asyncio.to_thread (or create the
tokenizer/file in a startup thread) instead of directly; in record() run
tokenizer.encode and all file.write calls inside asyncio.to_thread or a
threadpool task and await the result; in close() perform the final
Path.write_bytes or file close via asyncio.to_thread. Ensure self._tokenizer and
self._file are set/used only after the threaded operations complete to avoid
race conditions.

In `@tests/aiperf_mock_server/utils.py`:
- Around line 213-234: _maybe_record_request currently calls recorder.record
synchronously on the request path, causing blocking I/O; change it to offload
recording to an async-safe worker (e.g., schedule a background coroutine or call
asyncio.to_thread) instead of invoking recorder.record inline. Specifically,
keep using get_global_recorder(), _extract_request_content(), and
_extract_osl_fingerprint() to build the record payload, but replace the direct
recorder.record(...) call with an async-safe enqueue/schedule: either create a
background task that calls an async wrapper (e.g., _async_record(payload)) or
call asyncio.to_thread(recorder.record, ...) so disk writes happen off the event
loop; ensure any exceptions from the background call are caught/logged and do
not propagate to the request handler.

---

Nitpick comments:
In `@tests/unit/aiperf_mock_server/test_request_recorder.py`:
- Around line 17-31: Test function and parameters in TestHistogram violate
naming/type conventions: rename test_degenerate_inputs to follow
test__histogram_degenerate_inputs_returns_expected and add full type hints to
the signature (e.g., values: Sequence[float] and expected: Optional[Mapping[str,
Any]]) and the return type -> None; update the pytest.mark.parametrize parameter
names if necessary to match the typed signature and keep the same param cases
(single_value, all_equal, empty_returns_none); locate the _histogram usage to
ensure the test name references that function and apply the same
renaming+type-hint pattern to the other affected tests referenced (lines
85-149,151-260) so all tests follow test_<function>_<scenario>_<expected> and
include parameter and return type annotations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 18423a08-df97-4299-b54c-763f6d0e803b

📥 Commits

Reviewing files that changed from the base of the PR and between bb6f421 and ab21812.

📒 Files selected for processing (12)
  • docs/superpowers/plans/2026-05-18-recorder-stats-extension.md
  • docs/superpowers/specs/2026-05-18-recorder-stats-extension-design.md
  • tests/aiperf_mock_server/README.md
  • tests/aiperf_mock_server/app.py
  • tests/aiperf_mock_server/config.py
  • tests/aiperf_mock_server/request_recorder.py
  • tests/aiperf_mock_server/tokens.py
  • tests/aiperf_mock_server/utils.py
  • tests/integration/conftest.py
  • tests/integration/test_mock_server_record_requests.py
  • tests/unit/aiperf_mock_server/__init__.py
  • tests/unit/aiperf_mock_server/test_request_recorder.py
Comment thread docs/superpowers/plans/2026-05-18-recorder-stats-extension.md Outdated
Comment thread docs/superpowers/specs/2026-05-18-recorder-stats-extension-design.md Outdated
Comment thread docs/superpowers/specs/2026-05-18-recorder-stats-extension-design.md Outdated
Comment thread tests/aiperf_mock_server/README.md Outdated
Comment thread tests/aiperf_mock_server/README.md Outdated
Comment thread tests/aiperf_mock_server/request_recorder.py
Comment thread tests/aiperf_mock_server/utils.py
@codecov

codecov Bot commented May 19, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
docs/superpowers/plans/2026-05-18-recorder-vocab-distribution.md (1)

129-131: ⚡ Quick win

Avoid machine-specific absolute paths in runnable steps.

cd /Users/... makes the plan non-portable for other contributors/CI docs runs. Prefer repo-root-relative commands (or omit cd when already at repo root).

Also applies to: 1141-1143, 1164-1166, 1250-1258

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/plans/2026-05-18-recorder-vocab-distribution.md` around
lines 129 - 131, The runnable steps use a machine-specific absolute path (e.g.,
"cd /Users/fdinatale/Code/aiperf/.worktrees/mock-server-request-recorder") which
makes the plan non-portable; update those commands in the document (including
the other occurrences noted) to use repo-root-relative paths or omit the cd
entirely and run the pytest command from the repo root (replace the
absolute-worktree cd with a relative path or nothing and keep the pytest
invocation as-is) so contributors and CI can execute the steps without
machine-specific paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/superpowers/plans/2026-05-18-recorder-stats-extension.md`:
- Around line 1-13: Add the new "Recorder Stats Extension Implementation Plan"
document to the docs site index by adding an entry for that filename under the
superpowers -> plans section of index.yml with matching indentation and
ordering; ensure the YAML entry references the document filename exactly, uses
the same display title, and follows existing docs/**/*.md indexing style so the
page is discoverable in the Fern site.

In `@docs/superpowers/plans/2026-05-18-recorder-vocab-distribution.md`:
- Around line 1201-1227: The fenced stdout sample block that begins with
"Request distribution (100 requests)" is missing a language tag; update its
opening fence from ``` to ```text so the block is marked as plain text (this
will satisfy markdownlint MD040). Locate the triple-backtick block containing
the histogram/vocab output and replace the fence accordingly (reference the
block starting "Request distribution (100 requests)" and the "Vocab  used
5234/151936" lines to find it).

In `@docs/superpowers/specs/2026-05-18-recorder-stats-extension-design.md`:
- Around line 41-42: The spec currently conflicts about empty requested OSL:
choose one contract and make the doc and tests consistent—either (A) keep the
"empty OSL mirrors requested_osl: null" contract and update the tests to assert
emb_stats does not contain a requested_osl object (i.e.,
emb_stats["requested_osl"] is null/absent), or (B) adopt the non-null block
contract and update the examples/tests to require emb_stats["requested_osl"] to
be an object with histogram: null and unique_values: 0; update every reference
to requested_osl, emb_stats["requested_osl"]["histogram"], and
emb_stats["requested_osl"]["unique_values"] to match the chosen contract so the
spec and test examples are aligned.

In `@docs/superpowers/specs/2026-05-18-recorder-vocab-distribution-design.md`:
- Around line 98-122: The fenced stdout block starting with the triple-backtick
fence in the recorder-vocab-distribution example is missing a language tag;
update that opening fence from ``` to ```text so the block is marked as plain
text (this will satisfy MD040). Locate the fenced block shown in the diff (the
``` immediately before the "/v1/chat/completions  n=100" sample) and change it
to ```text, leaving the content and closing fence unchanged.
- Around line 154-155: The `_render_vocab_lines` docstring incorrectly states a
“four-line stdout block” while enumerating more lines; update the description
for `_render_vocab_lines(vd: dict) -> list[str]` to state the correct line count
and list the exact layout components in order (e.g., headline, top, blank,
shape, ticks/sparkline—adjust count accordingly), and ensure the related note
for `_build_summary` that mentions `vocab_distribution =
_vocab_distribution(...)` remains consistent with the corrected line-count and
layout wording.
- Line 127: On the "top:" description line (the example that shows `"<text>"
<count>` joined with `, `) remove the padded space inside the inline code span
for the delimiter example so MD038 is not triggered; edit the snippet that
currently uses backticks around `, ` to instead use a non-padded inline code
span like `,` or rephrase to a normal inline example without the
trailing/leading space in the code span, keeping the rest of the `"<text>"
<count>` formatting intact.

---

Nitpick comments:
In `@docs/superpowers/plans/2026-05-18-recorder-vocab-distribution.md`:
- Around line 129-131: The runnable steps use a machine-specific absolute path
(e.g., "cd
/Users/fdinatale/Code/aiperf/.worktrees/mock-server-request-recorder") which
makes the plan non-portable; update those commands in the document (including
the other occurrences noted) to use repo-root-relative paths or omit the cd
entirely and run the pytest command from the repo root (replace the
absolute-worktree cd with a relative path or nothing and keep the pytest
invocation as-is) so contributors and CI can execute the steps without
machine-specific paths.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2268156f-02af-4b6b-94c9-37efe2c98622

📥 Commits

Reviewing files that changed from the base of the PR and between ab21812 and 1e8e129.

📒 Files selected for processing (4)
  • docs/superpowers/plans/2026-05-18-recorder-stats-extension.md
  • docs/superpowers/plans/2026-05-18-recorder-vocab-distribution.md
  • docs/superpowers/specs/2026-05-18-recorder-stats-extension-design.md
  • docs/superpowers/specs/2026-05-18-recorder-vocab-distribution-design.md
Comment thread docs/superpowers/plans/2026-05-18-recorder-stats-extension.md Outdated
Comment thread docs/superpowers/plans/2026-05-18-recorder-vocab-distribution.md Outdated
Comment thread docs/superpowers/specs/2026-05-18-recorder-stats-extension-design.md Outdated
Comment thread docs/superpowers/specs/2026-05-18-recorder-vocab-distribution-design.md Outdated
Comment thread docs/superpowers/specs/2026-05-18-recorder-vocab-distribution-design.md Outdated
Comment thread docs/superpowers/specs/2026-05-18-recorder-vocab-distribution-design.md Outdated
@FrankD412

FrankD412 commented May 20, 2026

Copy link
Copy Markdown
Contributor Author

I was working on trying to verify #803 due to power mismatches and wanted to try and verify the dataset. Here's what the final output looks like from the mock server.

Request distribution (200 requests)
──────────────────────────────────────────────
  Definitions
    ISL/OSL: input/requested output sequence length in tokens; OSL is the request cap, not generated output.
    Vocab used: unique token IDs observed / tokenizer vocab size.
    top-10 cover: share of prompt tokens from the 10 most common token IDs.
    entropy: token-id diversity; higher means broader prompt vocabulary use.
    top decoded tokens: most frequent token IDs decoded for sanity checks; tokens are not words.
    vocab shape: log-scaled 80-bucket view across token-id space.
    vocab shape stats: mean/percentiles of prompt-token counts per bucket, including empty buckets.

  /v1/completions  n=200
    ISL            mean  1050.6   min   416   max  1703   p50  1042   p99  1694
    Requested OSL  mean   527.8   min   216   max   896   p50   522   p99   814

    ISL histogram (13 bins, n=200, 181 unique)
       416-  515    2 █░░░░░░░░░░░░░░░░░░░
       515-  614    0 ░░░░░░░░░░░░░░░░░░░░
       614-  713   14 ████████░░░░░░░░░░░░
       713-  812   18 ██████████░░░░░░░░░░
       812-  911   29 ████████████████░░░░
       911- 1010   25 ██████████████░░░░░░
      1010- 1109   36 ████████████████████
      1109- 1208   21 ████████████░░░░░░░░
      1208- 1307   23 █████████████░░░░░░░
      1307- 1406   15 ████████░░░░░░░░░░░░
      1406- 1505    8 ████░░░░░░░░░░░░░░░░
      1505- 1604    6 ███░░░░░░░░░░░░░░░░░
      1604- 1703    3 ██░░░░░░░░░░░░░░░░░░

    Requested OSL histogram (10 bins, n=200, 166 unique)
      216- 284    2 █░░░░░░░░░░░░░░░░░░░
      284- 352   12 █████░░░░░░░░░░░░░░░
      352- 420   26 ███████████░░░░░░░░░
      420- 488   34 ██████████████░░░░░░
      488- 556   48 ████████████████████
      556- 624   33 ██████████████░░░░░░
      624- 692   29 ████████████░░░░░░░░
      692- 760   10 ████░░░░░░░░░░░░░░░░
      760- 828    4 ██░░░░░░░░░░░░░░░░░░
      828- 896    2 █░░░░░░░░░░░░░░░░░░░


    Vocab  used 11991/128000 (9.4%)  top-10 cover 14%  entropy 10.6/17.0 bits
      top decoded tokens: " the" 4452, " I" 3969, " and" 3523, " of" 3119, " to" 3070

    vocab shape  (80 buckets over id 0..127999, log-y)

      bucket tokens mean  2623.9   p50   470   p90  2931   p95  6572   p99 40006

    ██▇▇▇▆▆▆▆▆▆▆▆▆▆▆▆▆▅▅▅▆▅▅▅▆▅▅▅▅▅▅▅▅▅▅▅▅▄▅▅▅▅▅▄▅▅▅▄▄▄▅▅▅▅▅▅▄▅▄▄▅▄▁▂ ▁▁▁▁▁▂▁▁▁▃▁▂▁
    0                   32K                 64K                 96K             128K

We can see how the ISL/OSL are distributed and the vocab distribution. We can extend this in the future if we need more detailed stats, but this was a start to verify/compare benchmark tools.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/unit/aiperf_mock_server/test_request_recorder.py (1)

494-501: ⚡ Quick win

Add # fmt: skip on the new @pytest.mark.parametrize block.

Please keep this decorator aligned with the repo’s test parametrize formatting rule.

Suggested update
     `@pytest.mark.parametrize`(
         "prompt,expected_ids",
         [
             ([11, 22, 33], [11, 22, 33]),
             ([[11, 22], [33, 44]], [11, 22, 33, 44]),
         ],
-    )
+    )  # fmt: skip

As per coding guidelines tests/**/*.py: “Use from pytest import param and put # fmt: skip on the ) line for parametrized tests”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/aiperf_mock_server/test_request_recorder.py` around lines 494 -
501, The new parametrized test decorator `@pytest.mark.parametrize` on
test_record_request_completion_prompt_token_ids_skip_tokenizer must follow test
formatting rules: import param from pytest and place a "# fmt: skip" comment on
the closing ")" line of the `@pytest.mark.parametrize` block so the formatter
leaves the param list as authored; update the decorator usage to use
pytest.param entries if needed and add the "# fmt: skip" directly after the
closing parenthesis of the decorator.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/aiperf_mock_server/request_recorder.py`:
- Around line 833-845: Update the stale docstring header that currently reads
"Return the 7-line stdout block for one endpoint's vocab_distribution." to
reflect the current output length (change "7-line" to "9-line"), and adjust any
surrounding wording if needed so the docstring accurately describes the
nine-line layout (leave the rest of the layout lines as-is); locate the
docstring by searching for the exact string "Return the 7-line stdout block for
one endpoint's vocab_distribution." in request_recorder.py.

---

Nitpick comments:
In `@tests/unit/aiperf_mock_server/test_request_recorder.py`:
- Around line 494-501: The new parametrized test decorator
`@pytest.mark.parametrize` on
test_record_request_completion_prompt_token_ids_skip_tokenizer must follow test
formatting rules: import param from pytest and place a "# fmt: skip" comment on
the closing ")" line of the `@pytest.mark.parametrize` block so the formatter
leaves the param list as authored; update the decorator usage to use
pytest.param entries if needed and add the "# fmt: skip" directly after the
closing parenthesis of the decorator.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ae3933c7-1a96-4958-9096-14fa5587940a

📥 Commits

Reviewing files that changed from the base of the PR and between 1e8e129 and 8b45da9.

📒 Files selected for processing (9)
  • tests/aiperf_mock_server/README.md
  • tests/aiperf_mock_server/models.py
  • tests/aiperf_mock_server/request_recorder.py
  • tests/aiperf_mock_server/tokens.py
  • tests/integration/test_mock_server_record_requests.py
  • tests/unit/aiperf_mock_server/test_request_recorder.py
  • tests/unit/server/test_api_compliance.py
  • tests/unit/server/test_models.py
  • tests/unit/server/test_tokens.py
✅ Files skipped from review due to trivial changes (1)
  • tests/aiperf_mock_server/README.md
Comment thread tests/aiperf_mock_server/request_recorder.py Outdated
@FrankD412 FrankD412 force-pushed the fdinatale/mock-server-request-recorder branch from 15bb558 to d0b8df0 Compare May 20, 2026 21:24

@FrankD412 FrankD412 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Nice feature — the recorder is well-structured, the test coverage is generous, and the JSONL-plus-summary contract is a sensible escape hatch for validating ISL/OSL behavior at the wire. Three things to look at before merge, in order:

  1. open(self.path, "wb") truncates the file on each restart. A user running two consecutive benchmarks against the same --record-requests PATH silently loses the first run. Repro: 20 records → 0 records on restart with no warning. Switch to "ab" (append) or refuse to overwrite without an opt-in flag.
  2. The recorder uses default I/O buffering, so SIGKILL / OOM loses the buffered tail and the entire .summary.json. Repro: kill -9 after 50 records → 48 on disk, no summary at all. Call self._file.flush() after each write.
  3. _encode_chat_prompt_ids has a try/except/else bug: the TypeError-retry's tokens are silently discarded. Falls through to the ChatML fallback. Untested path; harmless today because modern HF doesn't take the TypeError branch, but the retry is effectively dead code.

Also a one-character README fix (Qwen/Qwen3-0.6Bbuiltin on the --tokenizer default row, which this PR touches anyway).

Working well: per-endpoint stats blocks, the histogram + vocab-distribution rendering (the sparkline is a delight), the tokenization_mode tag on every JSONL row, the validator that rejects --no-tokenizer + --record-requests at parse time, and the integration test that locks down the JSONL schema. Tests pass and the design is right.

Comment thread tests/aiperf_mock_server/request_recorder.py
Comment thread tests/aiperf_mock_server/request_recorder.py Outdated
Comment thread tests/aiperf_mock_server/request_recorder.py
Comment thread tests/aiperf_mock_server/README.md Outdated
@FrankD412 FrankD412 marked this pull request as ready for review May 21, 2026 20:51
Comment thread tests/aiperf_mock_server/app.py
Comment thread tests/aiperf_mock_server/request_recorder.py Outdated
Comment thread tests/aiperf_mock_server/request_recorder.py
Comment thread tests/aiperf_mock_server/request_recorder.py Outdated
Comment thread tests/aiperf_mock_server/utils.py Outdated
FrankD412 and others added 14 commits May 21, 2026 15:23
… capture

Adds an opt-in recording mode to the mock server so we can verify what
ISL / requested-OSL distribution actually hits the wire during an aiperf
run. When --record-requests PATH is set, the server tokenizes each
incoming request inline with the configured tokenizer, appends one JSONL
record per request, and writes a per-endpoint distribution summary to
<PATH>.summary.json (and stdout) on shutdown.

Each record captures the full OSL fingerprint the client sent — raw
max_tokens, max_completion_tokens, min_tokens, ignore_eos,
reasoning_effort — alongside the resolved requested_osl and the real
tokenized isl. The summary aggregates isl / requested_osl / min_tokens
distributions and tallies ignore_eos + reasoning_effort categorically.

Design choices:
- In-process and synchronous in make_ctx. tokenizer.encode() is fast
  enough that a worker process would be overkill.
- Reuses the existing --tokenizer field rather than introducing a
  separate one, so ISL counts match the vocab the corpus uses.
- Config validator forces --workers=1 and rejects --no-tokenizer
  (incoherent with recording).
- conftest.create_server now lets tests opt out of the hard-coded
  no_tokenizer=True default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Adds the design for extending the per-endpoint summary with a
histogram and unique_values count on the isl and requested_osl
stats blocks. Distribution shape rather than just percentiles —
makes it possible to tell a uniform sample from a bimodal one
without staring at p10/p50/p90 numbers.

Bucketing rule: num_bins = max(10, ceil((max-min)/100)). 10-bin
floor keeps narrow OSL ranges informative; 100-token max width
caps the width on wide ISL ranges. Equal-width bins, observed
[min, max].

Output in both <path>.summary.json (parallel bin_edges/counts
arrays, pandas-friendly) and stdout (20-char horizontal bar
chart). Tests extend the existing recorder integration test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
… stats

Seven TDD-style tasks to extend the per-endpoint summary block with a
histogram and unique_values count on the isl and requested_osl fields.
Each task has explicit code, exact pytest invocations, expected output,
and a commit step.

Tasks:
1. _histogram helper + unit tests
2. wire histogram + unique_values into _build_summary
3. _render_histogram helper
4. wire renderer into _print_summary
5. extend integration test
6. refresh README
7. pre-merge lint/tests/pre-commit

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
ISL and OSL stats now print on their own indented rows under the
endpoint header, with `:7.1f` / `:5.0f` widths and a Unicode rule.
The single-line `ISL mean=... OSL mean=...` cram-on-one-line layout
drifted out of alignment when token counts crossed 5 digits.

Foundation for the histogram block coming next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Three small follow-ups from the final code review of the histogram
+ unique_values feature:

- _render_histogram: tighten `hist` annotation to
  `dict[str, list[float] | list[int]]` so it matches what `_histogram`
  actually returns.
- _print_summary: pass `sum(histogram["counts"])` as the histogram's
  `n=`, not `stats["count"]`. Matters when OSL observations are fewer
  than ISL (e.g. requests that omit max_tokens).
- Test the non-round-span path of `_histogram` (range 0..1001, 11 bins)
  so the last-edge pinning is exercised when the divisor isn't clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Four cosmetic divergences in the Summary stdout sample picked up by
the final review:

- Separator: 86 → 46 chars to match `_print_summary`'s rule width.
- Endpoint header: drop `n=  4000` padding; code emits `n=4000`.
- min_tokens row: use `mean ` / `p50 ` (no `=`) per current format.
- min_tokens / ignore_eos / reasoning_effort: 6-space → 4-space indent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Adds per-endpoint `vocab_distribution` block to the recorder summary
that surfaces how the dataset samples across the tokenizer's vocab:
coverage (unique IDs / vocab size), top-N concentration, Shannon
entropy with the uniform-sampling ceiling, an 80-bucket sparkline
over full token-id space (log-y), and a full token_id → count
frequency table in JSON. Stacked stdout layout with blank lines
between blocks; spacing refresh applies to the existing ISL/OSL
histogram rendering as well.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
…stats

Nine TDD-style tasks. Each task has explicit code, exact pytest
invocations, expected output, and a commit step.

1. Capture per-endpoint token-id Counter in RequestRecorder.
2. _compute_shape_80 helper.
3. _vocab_distribution helper (JSON block builder).
4. _render_vocab_lines helper (stdout block).
5. Wire vocab_distribution through _build_summary.
6. Wire renderer + blank-line spacing into _print_summary.
7. Extend integration test.
8. Refresh README.
9. Pre-merge verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
FrankD412 and others added 18 commits May 21, 2026 15:23
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Important fixes:
- _build_summary now reaches `_resolve_vocab_size` when vocab_size_source
  is "observed" — previously the guard required vocab_size to be non-None,
  which left the observed-fallback path dead and silently emitted
  vocab_distribution=null for tokenizers without a vocab-size attribute.
- _vocab_distribution now replaces non-printable decoded text with the
  `<id=N>` marker — previously only decode-raises hit the fallback, so
  control chars / null bytes leaked into top_tokens and stdout.
- Added end-to-end test exercising the observed path through
  open()/record()/close() for tokenizers with no vocab-size signal.

Minor cleanups also from review:
- round max_entropy_bits to 4 decimals for symmetry with entropy_bits.
- replace the decode_fn lambda with a named inner function (no noqa).
- tighten test_blank_lines_between_blocks to verify blank-line placement
  rather than just "at least one blank somewhere".
- fix incorrect comment in test_log_y_makes_small_bars_visible.
- tighten integration test's emb_vd assertion (vocab block is always
  non-null for endpoints with recorded ISL).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Open the recorder file in append mode so a second run against the same
--record-requests path accumulates instead of silently truncating prior
data on lifespan startup, and flush after every record write so the
on-disk file matches in-process state — abnormal exit (SIGKILL, OOM)
now loses only the in-flight record instead of the buffered tail and
the .summary.json that close() would have written.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
`_encode_chat_prompt_ids` wrapped both apply_chat_template calls in one
try/except/else, but Python's `else` only runs when the try body itself
completed without exception — so when the positional call raised TypeError
and the `conversation=` kwarg retry succeeded, the retry's tokens were
bound to `result` and then immediately discarded, and the function fell
through to the ChatML fallback. Extract the call into a small helper that
returns the result outside any try/except, so the retry's tokens actually
reach the caller. Add a unit-test stub whose apply_chat_template is
keyword-only to lock down the retry branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
The Configuration table claimed Qwen/Qwen3-0.6B; the actual default in
config.py is "builtin" (bundled tiktoken o200k_base, zero network
access). Also spell out what "builtin" means inline so readers don't
have to chase the field description.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
When the configured tokenizer has no chat template (e.g. the builtin
tiktoken o200k_base) the recorder falls back to literal ChatML
rendering, and each role marker is tokenized as ~5 raw tokens instead
of the single special-token id a real chat template would produce.
For builtin + a single user message this is +21 tokens over the bare
user content; for a 4-message conversation it is roughly +45. Without
this note, anyone validating --isl-mean N against recorder output sees
a systematic positive offset and may chase it as a bug in the client.
The tokenization_mode field already flags which path was used; the
note tells readers what to do about it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
The diagram block under "Auto-Scaling GPU Metrics" was an untagged
fence, which made some Markdown renderers (incl. the Fern site
linter) emit a "language specifier missing" warning. Tagging it
`text` keeps the rendered output identical and silences the linter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Same fix as the Token Flow block — the directory-tree fence was
untagged, tripping the docs linter. Pure rendering parity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Append mode left a reused --record-requests path with old + new records
on disk, but the per-process stats accumulators start empty each run,
so <path>.summary.json described only the latest run. Two consumers of
the same artifact pair would see different totals.

Truncate per run so the JSONL and the summary are always consistent
(reviewer dynamo-ops). Per-record flush is unchanged, so SIGKILL / OOM
still only loses the in-flight record.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
The lifespan teardown ran `await shutdown_scheduler()` and `recorder.close()`
inside the same `finally:` block, so any exception from the scheduler
shutdown step (timeout, ZMQ socket error, etc.) bypassed the recorder
cleanup. That meant `<path>.summary.json` was silently never written —
the artifact the user enabled `--record-requests` for in the first place.

Nest the recorder cleanup in its own inner `finally:` so it runs whether
or not scheduler shutdown succeeds. Add a unit test that patches
`shutdown_scheduler` to raise and asserts the summary file still ends
up on disk; this previously failed by leaving the summary missing.

Reported by reviewer dynamo-ops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
@FrankD412 FrankD412 force-pushed the fdinatale/mock-server-request-recorder branch from 8315e6d to 746f6d7 Compare May 21, 2026 22:23
Comment thread tests/aiperf_mock_server/app.py
Comment thread tests/aiperf_mock_server/utils.py

@debermudez debermudez left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tackle the 2 comments from dynamo ops and this is going in. Nice work!

FrankD412 and others added 5 commits May 21, 2026 18:10
…kend

`_tokenizer_call_ids()` unwrapped the AIPerf `Tokenizer` and invoked the
HF backend directly, which defaults to `add_special_tokens=True`. The
wrapper's own `__call__` configures `add_special_tokens=False`, so
recorded ISL counts for HF completion / TGI / embedding requests were
inflated by the backend's BOS/EOS sentinels.

Try the wrapper's `__call__` first and only fall back to unwrapping for
objects that don't expose it. Add a regression test that distinguishes
wrapper vs backend invocations and asserts the wrapper's 3-id result is
recorded (rather than the backend's 5-id, special-token-padded result).

Reported by reviewer dynamo-ops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
`TGIGenerateRequest` has no `stream` field, so `getattr(req, "stream", None)`
returned `None` for the streaming TGI endpoint. The recorder treated that as
falsy and never incremented `streamed_count`, leaving the per-endpoint
summary blind to TGI streaming traffic.

Derive `stream` from the endpoint when it's `/generate_stream`, fall back to
the request attribute everywhere else. Add two unit tests: one that asserts
`/generate_stream` records `stream=True`, one that pins `/generate` to the
existing falsy behavior.

Reported by reviewer dynamo-ops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
`init_scheduler(server_config)` runs after `recorder.open()` and
`set_global_recorder(recorder)`, but lives outside the `try: yield ...
finally:` cleanup block. So if `init_scheduler` raises during startup,
the exception propagates out of `lifespan` before the cleanup `finally`
is ever entered — the global recorder stays installed, the JSONL file
handle leaks, and `<path>.summary.json` is never written.

Wrap `init_scheduler` in its own `try/except`. On any startup failure,
unregister the global recorder and call `close()` (which writes the
summary) before re-raising. The existing yield/shutdown cleanup is
untouched and owns the successful-startup path.

Add a unit test symmetric to the existing shutdown-failure one: patch
`init_scheduler` to raise, drive the lifespan, then assert that the
summary file landed on disk and that `get_global_recorder()` is None.

Reported by reviewer dynamo-ops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
@ajcasagrande ajcasagrande enabled auto-merge (squash) May 23, 2026 06:01
@ajcasagrande ajcasagrande merged commit 806f6ec into main May 23, 2026
17 of 18 checks passed
@ajcasagrande ajcasagrande deleted the fdinatale/mock-server-request-recorder branch May 23, 2026 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 participants