fix: the nightly by fixing test docs end to end test suite and staging locations by saturley-hall · Pull Request #965 · ai-dynamo/aiperf

saturley-hall · 2026-05-20T00:03:54Z

This dramatically lowers runtime by parallelizing the test suite across multiple executors in one job and dynamically balancing expected run time.

Differentiates the location of staging the container and wheels when manually executed (with an override to make it look like a nightly in extraordinary situations).

Closes OPS-6560

Summary by CodeRabbit

Infrastructure
- CI workflows updated (action versions), improved nightly artifact/staging behavior with provenance, added a resolve-check workflow, and a non-blocking job that generates weight suggestions.
New Features
- Sharded GPU test execution with shard controls for CI runners.
- New CLI tool to analyze test runs and recommend per-command weights.
Refactor
- Reworked packaging extras and tightened dependency resolution constraints.
Tests
- Increased end-to-end timeouts; test harness and parser support per-command weights and shard slicing.
Documentation
- Tutorials annotated with profiling weight parameters; README adds arm64 install note.

Test-docs end-to-end: - src/aiperf/config/dataset/resolver.py — _check_timing_data now reports True for BURST_GPT_TRACE (matches the existing SAGEMAKER_DATA_CAPTURE branch); the loader's _REQUIRED_COLUMNS guarantees a Timestamp column, but the JSONL-only first-record probe always failed on CSV and blocked --fixed-schedule. New regression test in tests/unit/config/test_resolvers.py. - docs/tutorials/embeddings.md — gets its own vllm-embeddings-openai server (setup + health-check + retagged run markers). The BGE commands previously routed to the Qwen3-0.6B chat server and 404'd on /v1/embeddings. - tests/ci/test_docs_end_to_end/constants.py — AIPERF_COMMAND_TIMEOUT 300 → 600s; the long-conversations / goodput / aimo tutorials run long on the AMD GPU runner and were SIGKILLed mid-flight. AMD64 wheel-test: - pyproject.toml — extract a [test] extra from [dev]; [dev] keeps tooling-only deps and recursively consumes [test]. Zero impact on regular pip install aiperf users (extras are opt-in). - .github/workflows/nightly.yml — wheel-test step now installs "${NIGHTLY}[test]" instead of a hand-typed list; eliminates the drift that left jsonschema unlisted and broke collection of tests/unit/config/test_config_schema_generator_integration.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

Temporarily redirect the workflow_dispatch / checkout / SHA-verification gates from main to harrison/fix-nightly so the nightly-fix commit can be exercised end-to-end before merge. Each modified block is tagged `TEMPORARY` with a revert-before-merge note, and the dispatch-input description echoes the same warning. Note: GitHub reads the workflow file for `schedule:` triggers from the default branch only, so the cron will not fire on this branch — runs must be triggered via `gh workflow run nightly.yml --ref harrison/fix-nightly`. Revert this commit before merging the PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

- stage-nightly-wheel uploads to nightly/${TIMESTAMP}/${FN} so wheels are grouped by date; TIMESTAMP and the wheel filename's embedded date come from the same prepare-nightly output, so they cannot drift. - POST /api/pypi/<repo>/reindex after the deploy so Artifactory's PyPI simple index picks up the new wheel without manual recalc (generic PUT deploys bypass the metadata extractor). - Pass ARTIFACTORY_WHEEL_PATH, ARTIFACTORY_NIGHTLY_WHEEL_PATH, TIMESTAMP to the GitLab security-scan trigger so downstream consumers get the exact location instead of having to know the layout convention. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

vllm-default-openai owns 39 of the ~53 aiperf commands extracted from docs/tutorials/, and ran them sequentially in a single GPU job — ~57 min of the previous 73-min wall-clock. Split it into 4 contiguous shards plus one shard per modality-specific server (audio, vision, video, embeddings), 8 in total. Each shard runs on its own runner from the prod-aiperf-tester-amd-gpu-v1 pool (maxRunners=16, podAntiAffinity by host so each gets its own GPU and isolated localhost:8000), brings up its own vLLM, executes its slice, and tears down. Lives entirely in the called workflow (test-docs-end-to-end.yml), so PR / workflow_dispatch / nightly's workflow_call all get the matrix. - tests/ci/test_docs_end_to_end/main.py: new --server / --shard-index / --shard-total flags. Argparse rejects invalid combos. Contiguous-chunk slicing keeps at most one slow tutorial per shard (assuming the slow ones aren't clustered in the docs). - tests/ci/test_docs_end_to_end/setup_test.sh: forwards SERVER_NAME / SHARD_INDEX / SHARD_TOTAL env vars from the matrix. Unset env vars preserve the existing --all-servers behavior for local invocation. - .github/workflows/test-docs-end-to-end.yml: matrix of 8 shards, fail-fast: false. The aggregated job result that nightly.yml consumes is unchanged in shape (success iff all shards pass). Expected wall-clock: ~22-25 min total, down from 73 min. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

The previous nightly's vllm-default-openai-1of4 shard hit the 600s timeout on test 5 (aimo public-dataset) — the same test ran in 225s in the prior serial run. Under the 8-way parallel matrix, shared infra (HuggingFace API, ECR pulls, cold-start network contention) slowed the HF-dataset-heavy tests by ~2.7x. Two complementary fixes: 1. Bin-pack commands by author-annotated weight instead of by docs order. Tag syntax now accepts an optional ``weight=<seconds>`` hint: ``. Parser extracts it (default 60s); the sharder runs LPT greedy bin-packing, sorting commands heaviest-first and assigning each to the currently-lightest shard. The three known-slow tests are tagged: - docs/tutorials/aimo.md: weight=500 - docs/tutorials/goodput.md: weight=600 - docs/tutorials/multi-turn.md (long-conversations): weight=1100 At 4 shards on vllm-default-openai, this produces shard weights [1100, 1080, 1100, 1080] — long-conv gets its own shard; aimo and goodput each ride along with ~10 quick tests. 2. Raise AIPERF_COMMAND_TIMEOUT 600s -> 1200s as a safety net for parallel-load variance. The bin-packer should prevent any one shard from being over-weighted, but per-command runtime under contention is bursty (HF rate-limits, ECR throttling) and we'd rather wait than spuriously fail. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

@v4

GitHub is forcing Actions runners to Node 24 starting June 2nd, 2026 and removing Node 20 on September 16th, 2026. Bump every action that GitHub flagged in the deprecation annotations to the version that already targets Node 24. The same versions are already in use in run-unit-tests.yml, so this is just bringing the rest of the workflows into the same baseline. - actions/checkout @v4 -> @v6 - actions/setup-python @v4/@v5 -> @v6 - actions/upload-artifact @v4 -> @v5 - actions/download-artifact @v4 -> @v5 - actions/setup-node @v4 -> @v5 (fern-docs / run-fern-checks) - astral-sh/setup-uv @v5 -> @v7 (nightly only) Other actions (github-script, codecov, pre-commit, docker/setup-buildx, ytanikin/pr-conventional-commits) weren't on the deprecation list and are left as-is. Also: gate prepare-nightly's "Restore version artifact from prior attempt" step on github.run_attempt > 1. The step is intentionally written with continue-on-error: true so a missing artifact is fine on fresh runs, but the action still emits a "Artifact not found" warning annotation every time. Skipping on first attempts removes the noise; continue-on-error stays as belt-and-suspenders for the case where a prior attempt's artifact expired before re-run. Files touched: .github/workflows/{nightly,pre-commit,test-docs-end-to-end, fern-docs,run-fern-checks,run-integration-tests}.yml Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

…ft report The first weighted-sharding nightly (run 26095921607) finished with shard wall-clocks ranging from 12m to 40m+. Two distinct problems: 1. Declared weights drifted from reality. long-conv was tagged 1100s but ran 343s; goodput was tagged 600s but ran 243s; meanwhile blazedit_10k ran 700s with no weight at all, and burst-gpt-trace ran 544s with no weight. Shard 4 picked up the two heaviest unweighted tests and blew past the 10-min per-shard estimate by ~30 min. 2. fixed-schedule.md:122 failed in 1s when it landed in a different shard than fixed-schedule.md:58 — the second block uses precise_schedule.jsonl which the first block creates via cat-heredoc. Sequential test runs accidentally work because the same container working dir persists; sharding broke that assumption. Changes: - Rebalance weights from observed runtimes (observed × 1.3, rounded to 50): - blazedit.md:107 (blazedit_10k): default -> weight=900 - burst-gpt-trace.md:63: default -> weight=700 - multi-turn.md:288 (long-conv): weight=1100 -> weight=500 - multi-turn.md:257 (conv 50): default -> weight=350 - aimo.md:53: weight=500 -> weight=350 - goodput.md:43: weight=600 -> weight=350 - custom-dataset.md:337 (random_pool): default -> weight=250 - multi-turn.md:199 (conv 100): default -> weight=250 - sharegpt.md:38: default -> weight=200 - synthetic-dataset.md:65: default -> weight=200 - blazedit.md:54 (blazedit_5k): default -> weight=150 - default weight bump: 60s -> 80s (mean of observed unweighted is ~95s) - Make fixed-schedule.md:122 self-contained by inlining the precise_schedule.jsonl cat-heredoc, so the time-window example runs in any shard regardless of whether the basic example also landed there. - Fix LPT non-determinism in tests/ci/test_docs_end_to_end/main.py: Path.rglob returns filesystem-dependent order (macOS vs Linux differ), so the same code assigned commands to different shards locally vs in CI. Add (file_path, start_line) as a stable secondary sort key after the weight-descending primary key. - Add tools/suggest_test_docs_weights.py and a non-blocking suggest-shard-weights nightly job. The tool parses test-docs matrix logs, matches each test back to its markdown bash block, and writes a markdown table of suggested weight= updates to the GitHub Actions step summary. Purely informational — flagged by ratio>1.5 with abs delta >60s, by default-weighted tests observed >120s, and by tests over-weighted by 2x or more. Continues on error so it never blocks the pipeline. Predicted 4-shard balance after rebalance: loads [1660, 1580, 1610, 1590], spread 80s (down from ~1600s spread in run 26095921607). Expected wall-clock per shard: ~34 min (boot ~7 min + work ~27 min), floor pinned by the 900s blazedit_10k test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

The file has a `#!/usr/bin/env python3` shebang but git didn't track the +x bit when the file was first added in dfc9082 (it was created via the editor tool, which doesn't carry exec mode). pre-commit's check-shebang-scripts-are-executable hook flagged this on the 2026-05-19 nightly run. Fix: `git update-index --chmod=+x` to flip the bit in the git index. Working-tree mode was already 0755 locally; this propagates that to git so the hook is satisfied across fresh clones too. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

The suggest-shard-weights job already writes its markdown to $GITHUB_STEP_SUMMARY, but step summaries only render on the run's top-level page, not inside the per-job log view. Add two more output paths so the report is discoverable no matter where someone clicks: 1. Echo the report inside a collapsible ::group:: in the GHA log, so it's right there on the job-detail page. 2. Upload it as a `suggest-shard-weights` artifact with 30-day retention, so it can be downloaded via `gh run download` long after the run scrolls off the recent-runs list. Same step summary content as before; the file is written to suggest-weights.md in the workspace and then routed three ways. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

Today every nightly run produces the same WHEEL_VERSION, CONTAINER_TAG, and staging subpath regardless of trigger. Same-day same-commit re-runs (manual or otherwise) clobber each other in S3, Artifactory, ECR, and NGC — already documented as "last-writer-wins" in the existing comments. Disambiguate workflow_dispatch runs end-to-end: 1. PEP 440 local-version suffix on WHEEL_VERSION: 0.8.0.dev<YYYYMMDD>+manual.<run_id> - Wheels are uniquely addressable per manual dispatch. - `pip install aiperf-nightly` keeps resolving the canonical scheduled wheel by default (non-local versions sort above identical base+local in PEP 440). - `pip install aiperf==<base>.dev<date>+manual.<run_id>` pulls a specific manual build. - .github/actions/update-pyproject-version already accepts local-version segments, no infra change needed there. 2. New STAGE_SUBPATH output from prepare-nightly: nightly/<YYYYMMDD> for scheduled runs nightly-manual/<YYYYMMDD>/<run_id> for manual dispatches Consumed by stage-nightly-wheel's Artifactory TARGET_URL and by trigger-gitlab-security-scan's ARTIFACTORY_WHEEL_PATH / ARTIFACTORY_NIGHTLY_WHEEL_PATH forwarded variables. S3 upload stays keyed on WHEEL_VERSION so the upload/download symmetry holds for free — the local-version suffix already distinguishes the paths. 3. CONTAINER_TAG branch in prepare-nightly: nightly-<YYYYMMDD>-<sha7> for scheduled nightly-manual-<YYYYMMDD>-<sha7>-<run_id> for manual No downstream edits needed; build-container, the multi-arch manifest job, stage-nightly-container, and the GitLab trigger all consume CONTAINER_TAG as-is. 4. NGC staging destination repo renames aiperf -> aiperf-nightly in stage-nightly-container (5 occurrences in the crane copy + index-append + delete lines). Mirrors the PyPI naming scheme (`aiperf` canonical dist / `aiperf-nightly` nightly-flavored dist). ECR source side stays `aiperf` — purely an NGC-side rename. Operational pre-req: the `aiperf-nightly` NGC repo must exist with publish-service-account write access before the next staged dispatch. Also: remove the explicit Artifactory PyPI /reindex POST that was returning 403 with the current ARTIFACTORY_USER. The user has Deploy permission but not Manage Repository; widening that grant is tracked separately. Until then, simple-index visibility for freshly-uploaded wheels relies on the repo's auto-reindex mechanism or an out-of-band UI recalc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

Default for manual workflow_dispatch runs is unchanged: artifacts get the +manual.<run_id> wheel suffix, nightly-manual/<date>/<run_id>/ staging path, and nightly-manual-<date>-<sha>-<run_id> container tag (everything we just added in b7fabee). Adds an opt-out: setting `manual_artifacts=false` on a manual dispatch falls back to the bare scheduled-nightly naming (no local-version suffix, plain `nightly/<date>/` path, plain `nightly-<date>-<sha>` container tag). Useful for emergency re-publishes where a manual run is *meant* to overwrite the scheduled artifact locations — e.g. re-running today's nightly because the cron-triggered build failed. Schedule events ignore the input entirely; the GITHUB_EVENT_NAME check short-circuits before MANUAL_ARTIFACTS is consulted. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

Use Artifactory's matrix-parameter syntax on the PUT URL to attach two properties to every wheel staged by stage-nightly-wheel: github.run.url = https://github.com/<org>/<repo>/actions/runs/<run_id> git.commit.sha = <full commit SHA the wheel was built from> Both properties are visible in the Artifactory UI under each wheel's Properties tab, included in Xray scan metadata, and queryable via the AQL API ("show me every wheel built from SHA X" or "show me every wheel from yesterday's nightly run"). URL value is URL-encoded with python3 (always available on the builder runner) so the ':' and '/' characters in the GitHub Actions URL don't confuse Artifactory's matrix-parameter parser. Path URL (without properties) is still what gets logged in the 'Wheel staged to Artifactory:' line for readability — the properties appear as a separate log line below. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

yarl is transitive via aiohttp, but the Dockerfile's env-builder runs a cold `uv sync` (no --frozen, no uv.lock COPY) so resolution is fresh against PyPI on every build. That makes us vulnerable to the few-hours window after upstream pushes a new release with incomplete wheel coverage. Specifically today: yarl 1.24.0 uploaded at 17:23 UTC with only cp310 wheels — no cp311/12/13, no sdist. Our env-builder runs on python:3.13-slim-bookworm, so uv picked 1.24.0 (the new "latest" per PyPI metadata) and bailed with "doesn't have a source distribution or wheel for the current platform", taking out every test-docs-end-to-end shard for the run. Direct constraint on a transitive dep is unusual but legit — the resolver honors it during `uv sync` regardless of who pulls yarl in. `<2.0` keeps us on the current major; the comment in pyproject.toml flags it as removable once upstream's release pipeline becomes reliable about uploading all platform wheels atomically. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

The prepare-nightly job's version-info file changes extension from .env to .txt to match what it actually is — a human-readable summary of the resolved version strings for the run. The format stays as shell-source-able KEY='value' lines so the same file restores correctly on re-attempts. Two visibility wins: 1. The contents are echoed into a collapsible "Resolved version info" group in the workflow log right after they're computed. No need to download the artifact (or click into a sub-job) to see what WHEEL_VERSION / STAGE_SUBPATH / CONTAINER_TAG resolved to. 2. The archived artifact (`nightly-version`) now contains version-info.txt — same file, more honest extension. Downstream tools that grep / `cat` the artifact don't care about the extension; tools that `source` it still work because the format didn't change. Migration safety: the restore step probes for version-info.txt first, then falls back to version-info.env. Re-attempts of runs started before this change still find their snapshot. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

github-actions · 2026-05-20T00:04:06Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@15e3fb41cb1d0226eb5f355e5e6da439afd7f7ed

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@15e3fb41cb1d0226eb5f355e5e6da439afd7f7ed

Last updated for commit: 15e3fb4 • Browse code

github-actions · 2026-05-20T00:04:56Z

Fern Docs Preview: https://nvidia-preview-81f17757-2da6-4d8a-8a68-857c9c442dc5.docs.buildwithfern.com/aiperf/dev

coderabbitai · 2026-05-20T00:10:14Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

This PR adds weight-based end-to-end test sharding and a weight-suggestion tool, annotates tutorials with weight markers, updates GitHub Action versions across workflows, and enhances the nightly workflow with manual-artifact naming, stage_subpath outputs, provenance metadata, and staged wheel/container changes.

Changes

End-to-End Test Sharding and Weight System

Layer / File(s)	Summary
Weight Data Model and Tutorial Directive Parsing `tests/ci/test_docs_end_to_end/data_types.py`, `tests/ci/test_docs_end_to_end/parser.py`	`Command` dataclass gains a `weight: int` field (default 80), and the markdown parser extracts optional `weight=<int>` attributes from HTML comment directives to populate command weights.
Test Sharding Algorithm `tests/ci/test_docs_end_to_end/main.py`	Argument parser adds mutually exclusive `--all-servers` / `--server` groups and `--shard-index`/`--shard-total` parameters. When sharding is active, a longest-processing-time greedy bin-packing algorithm partitions commands by weight across shard bins, selects the target shard's subset, and restores documentation order.
Workflow Matrix and Shell Integration `.github/workflows/test-docs-end-to-end.yml`, `tests/ci/test_docs_end_to_end/setup_test.sh`	Workflow matrix expands test job to 4 shards for `vllm-default-openai` and single shards for other servers. Environment variables (`SERVER_NAME`, `SHARD_INDEX`, `SHARD_TOTAL`) are passed from the matrix to `setup_test.sh`, which dynamically constructs `python3 main.py` arguments based on those values.
Detect-Changes Hash and Timeouts `tests/ci/test_docs_end_to_end/dump_config_hash.py`, `tests/ci/test_docs_end_to_end/constants.py`	Includes `weight` in detect-changes hash (normalizing commands as `[command, weight]` tuples) and increases `AIPERF_COMMAND_TIMEOUT` to 1200 seconds for end-to-end test commands.
Dataset Resolver and Unit Test `src/aiperf/config/dataset/resolver.py`, `tests/unit/config/test_resolvers.py`	`DatasetResolver._check_timing_data` treats `BURST_GPT_TRACE` as having timing data and a unit test verifies BurstGPT CSV timing detection and fixed-schedule validation.

Weight Suggestion Analysis Tool

Layer / File(s)	Summary
Suggestion tool implementation `tools/suggest_test_docs_weights.py`	New CLI tool fetches GitHub Actions workflow logs, parses timestamped AIPerf test events, matches them to tutorial commands via canonicalized bash text, computes suggested `weight=` updates using an observed-time safety margin, renders a markdown report, and integrates with CI summaries/artifacts.

Tutorial Documentation Weight Annotations

Layer / File(s)	Summary
Weight Markers for Tutorial Examples `docs/tutorials/aimo.md`, `docs/tutorials/blazedit.md`, `docs/tutorials/burst-gpt-trace.md`, `docs/tutorials/custom-dataset.md`, `docs/tutorials/goodput.md`, `docs/tutorials/multi-turn.md`, `docs/tutorials/sharegpt.md`, `docs/tutorials/synthetic-dataset.md`	Adds or updates `weight=` parameters in embedded `aiperf-run-vllm-default-openai-endpoint-server` directives across tutorial examples (weights range from 150 to 900).
Specialized Tutorial Updates `docs/tutorials/embeddings.md`, `docs/tutorials/fixed-schedule.md`	Embeddings tutorial switches to embeddings-specific snippet markers and replaces a simple readiness check with a polling loop waiting for `/v1/embeddings` HTTP 200; the fixed-schedule tutorial inlines `precise_schedule.jsonl` generation within the code block for a self-contained example.

CI Action Updates and Nightly Workflow Enhancements

Layer / File(s)	Summary
GitHub Actions Version Bumps `.github/workflows/fern-docs.yml`, `.github/workflows/pre-commit.yml`, `.github/workflows/run-fern-checks.yml`, `.github/workflows/run-integration-tests.yml`, `.github/workflows/nightly.yml`, `.github/workflows/test-docs-end-to-end.yml`	Updates `actions/checkout` (v4→v6), `actions/setup-python` (v4/v5→v6), `actions/setup-node` (v4→v5), and related actions across multiple workflows.
Nightly Workflow Dispatch and Manual Artifacts `.github/workflows/nightly.yml`	Adds `workflow_dispatch.inputs.manual_artifacts` (default true), computes distinct naming/staging (`nightly-manual/<YYYYMMDD>/<run_id>/` and `+manual.<run_id>` suffix) for manual runs when enabled, and exports `prepare-nightly.outputs.stage_subpath`.
Version Info Persistence and Artifact Format `.github/workflows/nightly.yml`	Restores/sources `version-info.txt` with fallback, uploads `nightly-version` via `actions/upload-artifact@v5`, logs resolved version info, and persists `.nightly-version/version-info.txt`.
Artifact Staging and Provenance `.github/workflows/nightly.yml`	Stages container images to `aiperf-nightly` (sourced from ECR `aiperf`), deploys wheels under `${STAGE_SUBPATH}` with appended provenance properties (`github.run.url`, `git.commit.sha`), and adjusts delete/index/tag behavior.
GitLab Security Scan Integration `.github/workflows/nightly.yml`	Passes explicit `TIMESTAMP`, `ARTIFACTORY_WHEEL_PATH`, and `ARTIFACTORY_NIGHTLY_WHEEL_PATH` (derived from `STAGE_SUBPATH`) to the GitLab trigger payload.
Nightly Suggest Weights Job `.github/workflows/nightly.yml`	Adds a non-blocking `suggest-shard-weights` job that runs the suggestion tool, appends the report to the job summary, and uploads `suggest-weights.md` as an artifact.

Configuration and Test Infrastructure

Layer / File(s)	Summary
Dependency Restructuring and UV config `pyproject.toml`	Adds a `test` optional-dependencies group and a `[tool.uv] required-environments` block to influence uv resolution across target platforms.
Resolve Check Workflow `.github/workflows/resolve-check.yml`	Adds a Resolve Check workflow that runs `uv sync --active --no-install-project` on Python 3.13 to validate dependency resolution against `required-environments`.
README Note `README.md`	Adds a Linux aarch64 installation note about needing a C build toolchain for the `crick` sdist dependency.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through logs by moonlit beams,

Weights now guide the test-run dreams;
Shards align like carrots in a row,
Nightly wheels with provenance aglow,
Tutorials tuned — hop, tweak, and go!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'fix: the nightly by fixing test docs end to end test suite and staging locations' references real aspects of the PR (test docs end-to-end suite, staging locations) but uses awkward phrasing ('fix: the nightly by fixing') that obscures the main intent and is not concise.	Clarify the title to better convey the main objective. Consider: 'fix: parallelize and balance test-docs end-to-end suite with dynamic weights' or 'fix: add sharding to test-docs e2e and separate manual staging locations'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

.github/workflows/nightly.yml (1)
1042-1084: 💤 Low value

Consider adding persist-credentials: false for defense in depth.

The job design is solid: non-blocking with continue-on-error: true, good visibility via log group + step summary + artifact, and correct if condition to skip when the upstream job is skipped.

Static analysis flags the checkout step (Line 1056) for missing persist-credentials: false. While this job only reads data and poses low risk, adding the flag is a low-effort hardening:
🛡️ Suggested hardening
       - uses: actions/checkout@v6
         with:
           ref: ${{ needs.prepare-nightly.outputs.resolved_sha }}
+          persist-credentials: false
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/nightly.yml around lines 1042 - 1084, The actions/checkout
step in the suggest-shard-weights job should include persist-credentials: false
to avoid leaving repository credentials in the runner workspace; update the
checkout step that uses actions/checkout@v6 (inside the suggest-shard-weights
job) to add with: persist-credentials: false while keeping the existing ref: ${{
needs.prepare-nightly.outputs.resolved_sha }} entry so the job still checks out
the correct commit.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/nightly.yml:
- Around line 93-122: Revert the three TEMPORARY branch-gate changes so the
workflow targets main: change the "Require dispatch from harrison/fix-nightly"
step's conditional/message to require/main reference to main, update the
"Checkout" step so its ref uses inputs.sha || 'main' instead of
'harrison/fix-nightly', and update the "Verify SHA is on harrison/fix-nightly"
step to verify against origin/main (and adjust its step name/message
accordingly); ensure the semantics (rejecting dispatches not on the target
branch and verifying merge-base) remain the same after switching to main.

In @.github/workflows/test-docs-end-to-end.yml:
- Line 39: Replace the mutable tag usages of the checkout action with pinned
immutable commit SHAs: locate every occurrence of "uses: actions/checkout@v6"
(the three instances referenced in the diff) and replace each with the
corresponding full commit SHA form "uses: actions/checkout@<full-commit-sha>";
ensure you use the exact commit SHA for the version you intend to pin and update
all three occurrences consistently so the workflow satisfies the immutable-SHA
policy.
- Around line 39-42: Update both uses of the actions/checkout@v6 steps (the
checkout that performs read-only git operations and the subsequent checkout
before tests) to explicitly set persist-credentials: false; locate the two
occurrences of "uses: actions/checkout@v6" in the workflow and add the key
"persist-credentials: false" under their "with:" blocks so credentials are not
persisted to the workspace.

In `@tests/unit/config/test_resolvers.py`:
- Line 499: Rename the test function test_burst_gpt_csv_reports_timing_data to
follow the repo convention test_<function>_<scenario>_<expected> and add an
explicit return type annotation -> None; for example rename
test_burst_gpt_csv_reports_timing_data to
test_burst_gpt_csv_reports_timing_data_records_timing and change its signature
to include -> None so the function definition
(test_burst_gpt_csv_reports_timing_data / new name) matches the test naming and
type-hinting guidelines.

In `@tools/suggest_test_docs_weights.py`:
- Around line 239-247: The current mapping by_bash built with
{_canonicalize_bash(c.command): c for c in commands} loses duplicates because
later commands overwrite earlier ones, causing ambiguous bash-to-command
matches; update the logic that builds by_bash to collect all Command entries per
canonicalized bash (e.g., map string -> list[Command] or detect duplicates) and
then update the matching loop over runs to handle multiple candidates: when
by_bash returns multiple Commands for r.bash_content, disambiguate using
additional metadata (file, line, tutorial id) or skip/record as ambiguous
instead of silently overwriting, ensuring Suggestion creation uses the correct
Command instance(s) and increments matched/unmatched appropriately; refer to
_canonicalize_bash, by_bash, runs, commands, and Suggestion to locate where to
change the collection and matching logic.

---

Nitpick comments:
In @.github/workflows/nightly.yml:
- Around line 1042-1084: The actions/checkout step in the suggest-shard-weights
job should include persist-credentials: false to avoid leaving repository
credentials in the runner workspace; update the checkout step that uses
actions/checkout@v6 (inside the suggest-shard-weights job) to add with:
persist-credentials: false while keeping the existing ref: ${{
needs.prepare-nightly.outputs.resolved_sha }} entry so the job still checks out
the correct commit.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f981486f-3d02-4e30-a4f8-3dd03eaa4f89

📥 Commits

Reviewing files that changed from the base of the PR and between bb6f421 and 686c734.

📒 Files selected for processing (25)

.github/workflows/fern-docs.yml
.github/workflows/nightly.yml
.github/workflows/pre-commit.yml
.github/workflows/run-fern-checks.yml
.github/workflows/run-integration-tests.yml
.github/workflows/test-docs-end-to-end.yml
docs/tutorials/aimo.md
docs/tutorials/blazedit.md
docs/tutorials/burst-gpt-trace.md
docs/tutorials/custom-dataset.md
docs/tutorials/embeddings.md
docs/tutorials/fixed-schedule.md
docs/tutorials/goodput.md
docs/tutorials/multi-turn.md
docs/tutorials/sharegpt.md
docs/tutorials/synthetic-dataset.md
pyproject.toml
src/aiperf/config/dataset/resolver.py
tests/ci/test_docs_end_to_end/constants.py
tests/ci/test_docs_end_to_end/data_types.py
tests/ci/test_docs_end_to_end/main.py
tests/ci/test_docs_end_to_end/parser.py
tests/ci/test_docs_end_to_end/setup_test.sh
tests/unit/config/test_resolvers.py
tools/suggest_test_docs_weights.py

codecov · 2026-05-20T00:16:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

The 5600262 commit on this branch pointed nightly.yml's dispatch gates at harrison/fix-nightly so we could exercise the full pipeline end-to-end against the in-flight branch. Now that the PR is passing CI and getting merged, restore the gates to main: - workflow_dispatch input description for `sha` says "Must be on main" again. - "Require dispatch from main" check rejects dispatches from any other ref. - Checkout default ref goes back to 'main'. - "Verify SHA is on main" runs merge-base against origin/main. Net effect after merge: scheduled cron continues to build tip of main; manual dispatches must come from main and pass an ancestor check against origin/main. Behavior is identical to pre-56002626 baseline. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

1. tools/suggest_test_docs_weights.py — by_bash dedup The previous {_canonicalize_bash(c.command): c for c in commands} silently dropped all-but-one of any markdown commands sharing canonicalized bash. No tutorials collide today but a future duplicate would be invisible. Switch by_bash to dict[str, list[Command]] (defaultdict + append), log an INFO line naming every colliding markdown location at collection time, and emit one Suggestion per matched Command in the loop. Matched/unmatched counts still increment per-run (one TestRun = one or zero matches), but the suggestion list can carry > one row per run when bash collides — appropriate because all matched markdown locations need the same weight update. 2. tests/ci/test_docs_end_to_end/dump_config_hash.py — include weight The detect-changes hash skipped the GPU e2e workflow whenever a PR touched only weight= annotations, even though shard assignment depends on weight. Hash each aiperf_command as [c.command, c.weight] so weight-only edits invalidate the hash and trigger the e2e job. Verified locally: flipping weight=350 -> weight=351 on one tutorial changes the hash. 3. docs/tutorials/embeddings.md — combine cat + aiperf into one block The previous structure put the aiperf-run tag on a `cat > inputs.jsonl` block followed by a separate ungtagged bash block with `aiperf profile`. The parser only grabs the FIRST bash block after a tag, so CI was running the cat (a no-op) and reporting success without ever exercising BGE. Merge into one tagged block so the runner actually invokes aiperf profile against the BGE embedding endpoint. Default weight (80s) is sufficient: the companion synthetic-BGE test in the same file is also default-weighted and the whole vllm-embeddings-openai shard finishes in 7m02s end-to-end; this custom-input variant uses --request-count 5 so it's strictly smaller than its sibling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tools/suggest_test_docs_weights.py (1)

259-305: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Aggregate duplicate-bash runs before emitting suggestions.

When the same canonicalized bash appears in multiple markdown blocks and both blocks run, Lines 259-305 emit suggestions for every (run, command) pair. That turns two real executions into four rows and can give the same file conflicting suggested weights if the runtimes differ. Collapse runs to one observed value per bash_content before iterating cmds (for example, keep the max observed runtime), then derive matched from that deduped set so the summary stays consistent.

Proposed fix

-    matched = 0
     unmatched = 0
-    suggestions: list[Suggestion] = []
+    observed_by_bash: dict[str, TestRun] = {}
     for r in runs:
         cmds = by_bash.get(r.bash_content, [])
         if not cmds:
             unmatched += 1
             log.debug("No markdown match for test %d in %s", r.test_index, r.job_name)
             continue
-        matched += 1
         if r.status == "failed":
             # A 1-second-fail test gives us no useful timing; skip.
             continue
+        prev = observed_by_bash.get(r.bash_content)
+        if prev is None or r.actual_seconds > prev.actual_seconds:
+            observed_by_bash[r.bash_content] = r
 
-        # When several markdown blocks share canonicalized bash, the run's
-        # runtime applies to all of them — they need the same weight, so
-        # emit one Suggestion per Command pointing at each markdown location.
-        for cmd in cmds:
+    matched = sum(len(by_bash[bash]) for bash in observed_by_bash)
+    suggestions: list[Suggestion] = []
+    for bash_content, r in observed_by_bash.items():
+        # Emit one Suggestion per markdown location using the selected runtime
+        # for this canonicalized bash block.
+        for cmd in by_bash[bash_content]:
             current = cmd.weight
             observed = r.actual_seconds
             suggested = _round_to(int(observed * SAFETY_MARGIN))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/suggest_test_docs_weights.py` around lines 259 - 305, The current loop
emits a Suggestion for every (run r, Command cmd) pair causing duplicate
suggestions when the same r.bash_content appears in multiple runs; instead,
aggregate runs by bash_content (the by_bash key) before emitting suggestions:
build a map from bash_content to a single observed_seconds (e.g., the max of
r.actual_seconds across runs with that bash_content) and a representative run
metadata (job_id, job_name, passed_line_index) and compute matched/unmatched
from that deduped set, then iterate cmds = by_bash[bash_content] and use the
aggregated observed_seconds (and the representative run for anchor) when
computing observed, suggested, ratio, and appending Suggestion objects (keeping
references to DEFAULT_WEIGHT, SAFETY_MARGIN, default_threshold, threshold,
suggestions, Suggestion, cmd.start_line, cmd.file_path as in the diff).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tools/suggest_test_docs_weights.py`:
- Around line 259-305: The current loop emits a Suggestion for every (run r,
Command cmd) pair causing duplicate suggestions when the same r.bash_content
appears in multiple runs; instead, aggregate runs by bash_content (the by_bash
key) before emitting suggestions: build a map from bash_content to a single
observed_seconds (e.g., the max of r.actual_seconds across runs with that
bash_content) and a representative run metadata (job_id, job_name,
passed_line_index) and compute matched/unmatched from that deduped set, then
iterate cmds = by_bash[bash_content] and use the aggregated observed_seconds
(and the representative run for anchor) when computing observed, suggested,
ratio, and appending Suggestion objects (keeping references to DEFAULT_WEIGHT,
SAFETY_MARGIN, default_threshold, threshold, suggestions, Suggestion,
cmd.start_line, cmd.file_path as in the diff).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0cddbf57-57ea-4226-b823-865a8feaca06

📥 Commits

Reviewing files that changed from the base of the PR and between f9a3a02 and bdac536.

📒 Files selected for processing (3)

docs/tutorials/embeddings.md
tests/ci/test_docs_end_to_end/dump_config_hash.py
tools/suggest_test_docs_weights.py

…hots The compute step writes STAGE_SUBPATH on $GITHUB_OUTPUT (line 220), but legacy version-info.env files — from before the scheduled/manual split — don't define it. Under `set -u` (the step uses `set -euo pipefail`) the unbound-variable expansion aborts the job before any other output is emitted, breaking re-attempts of nightlies started before the rename. Fix: after sourcing a snapshot, check whether STAGE_SUBPATH is set and default it to `nightly/${TIMESTAMP}` if not. All legacy artifacts physically landed under that bare path (the split didn't exist yet), so the backfill matches what's actually in S3 / Artifactory / NGC. Verified locally by sourcing a synthesized legacy env file under `set -euo pipefail` — the new conditional populates STAGE_SUBPATH and the step proceeds normally. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

The .env -> .txt rename added an elif fallback and a STAGE_SUBPATH backfill so a "legacy" snapshot wouldn't crash the restore step. That code is unreachable in practice: GitHub Actions reruns always execute the same workflow file the original run was triggered with, so an artifact written by this version of the code is always read back by this same version. No mixed-extension or missing-STAGE_SUBPATH scenario can actually occur in CI. Strip both pieces of defense before they get copy-pasted into something else. Restore is back to a single `if -f .txt` branch with no fallbacks; the file format is still shell-`source`-able so sourcing works identically across the extension change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

Two failed nightlies in 24 hours from upstream partial releases: yarl 1.24.0 (only cp310 wheels, no sdist) on 2026-05-19, then greenlet 3.5.1 (no linux/aarch64 wheel, no sdist) on 2026-05-20. Both broke env-builder's cold `uv sync` because the resolver picked the new "latest" version, found no compatible distribution for our build platform, and bailed. Range pinning each offender as it appears is whack-a-mole; the next binary-extension transitive that ships a partial release does it again. Systemic fix is what uv itself suggested in the greenlet error message: hint: consider adding "sys_platform == 'linux' ..." to tool.uv.required-environments `required-environments` makes the resolver refuse any version that lacks a usable distribution (wheel OR sdist) on every listed platform, and silently backtracks until it finds one that does. Forward-compatible: as upstream completes their uploads, the resolver picks the newer version up automatically. Changes: - pyproject.toml: add [tool.uv] required-environments covering Linux x86_64 / aarch64 (the build-container matrix) plus macOS arm64 (developer laptops). crick remains sdist-only on Linux aarch64 and passes the constraint via its sdist (env-builder compiles it from source today, unchanged behavior). - pyproject.toml: drop the now-redundant `yarl>=1.23,<2.0` direct pin + its multi-line comment. required-environments handles it. - .github/workflows/nightly.yml: new `resolve-check` job on prod-aiperf-default-v1 after prepare-nightly that runs the resolver only (no install) and gates build-container on success. Fail-to-feedback drops from ~10 min (env-builder docker stage) to ~1 min. - .github/workflows/resolve-check.yml: new dedicated workflow that runs the same check on pushes to main / pull-request/* / release/* branches so PR authors see partial-release failures fast and explicitly instead of buried in `make first-time-setup`. - README.md: surface the pre-existing crick / aarch64 install prereq (system C compiler needed) before users discover it via cryptic setuptools errors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

saturley-hall · 2026-05-20T16:03:35Z

@coderabbitai review

coderabbitai · 2026-05-20T16:03:42Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.github/workflows/nightly.yml (1)

1134-1134: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use resolved SHA in summary instead of github.sha.

For manual dispatch builds, ${{ github.sha }} can differ from the built commit (inputs.sha). Use ${{ needs.prepare-nightly.outputs.resolved_sha }} to keep provenance accurate.

Suggested patch

-            echo "| Commit | \`${{ github.sha }}\` |"
+            echo "| Commit | \`${{ needs.prepare-nightly.outputs.resolved_sha }}\` |"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/nightly.yml at line 1134, The summary currently prints the
built commit using `${{ github.sha }}` which can be incorrect for manual
dispatch; update the echo that writes the Commit line so it uses the resolved
SHA output from the prepare job i.e. replace `${{ github.sha }}` with `${{
needs.prepare-nightly.outputs.resolved_sha }}` (refer to the echo statement that
prints "| Commit | \`${{ github.sha }}\` |" in the nightly.yml workflow) to
ensure the summary records the actual built commit provenance.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/nightly.yml:
- Around line 368-373: Update the resolve-check step to harden external actions:
replace floating tags with pinned commit SHAs for actions/checkout,
astral-sh/setup-uv, and actions/setup-python (e.g., use actions/checkout@<sha>,
astral-sh/setup-uv@<sha>, actions/setup-python@<sha>) and add
persist-credentials: false to the actions/checkout step so the checkout does not
leave a token in the runner; ensure you update the existing lines referencing
actions/checkout@v6, astral-sh/setup-uv@v7, and actions/setup-python@v6
accordingly.

In @.github/workflows/resolve-check.yml:
- Around line 36-38: Update the actions/checkout step that uses "uses:
actions/checkout@v6" to explicitly disable credential persistence by adding
"persist-credentials: false" to that checkout step (unless a later workflow step
requires authenticated git operations); this limits exposure of the GITHUB_TOKEN
while leaving the other pinned actions (astral-sh/setup-uv@v7 and
actions/setup-python@v6) unchanged.

---

Outside diff comments:
In @.github/workflows/nightly.yml:
- Line 1134: The summary currently prints the built commit using `${{ github.sha
}}` which can be incorrect for manual dispatch; update the echo that writes the
Commit line so it uses the resolved SHA output from the prepare job i.e. replace
`${{ github.sha }}` with `${{ needs.prepare-nightly.outputs.resolved_sha }}`
(refer to the echo statement that prints "| Commit | \`${{ github.sha }}\` |" in
the nightly.yml workflow) to ensure the summary records the actual built commit
provenance.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a6957663-0797-454f-b541-f6315c871c9a

📥 Commits

Reviewing files that changed from the base of the PR and between fdc46e0 and 2456525.

📒 Files selected for processing (4)

.github/workflows/nightly.yml
.github/workflows/resolve-check.yml
README.md
pyproject.toml

Add persist-credentials: false to both new resolve-check checkout steps (the dedicated PR-trigger workflow and the nightly job). Neither step does any authenticated git operations after the checkout, so the default credential persistence just leaves the GITHUB_TOKEN in the runner's .git/config for the rest of the job unnecessarily. CodeRabbit's other suggestion — pin the three actions (actions/checkout, astral-sh/setup-uv, actions/setup-python) to commit SHAs instead of major-version tags — is intentionally not applied here. Every other workflow in .github/workflows/ uses major-version tags, and CodeRabbit's first comment even noted "The version-pinned actions follow the repository's established pattern across all workflows." Pinning just these three lines to SHAs would be inconsistent and create a one-off maintenance surface; SHA-pinning policy, if adopted, should be a repo-wide change in a separate PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

saturley-hall · 2026-05-20T18:46:31Z

@coderabbitai review

coderabbitai · 2026-05-20T18:46:37Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…dant nightly- tag prefix - prepare-nightly: strip nightly-/nightly-manual- from CONTAINER_TAG (NGC repo aiperf-nightly already conveys this; RELEASE_TYPE carries the cron-vs-dispatch signal). - stage-nightly-container: expose the staged multi-arch manifest reference (${NGC_STAGING_IMAGE_BASE}/aiperf-nightly:${CONTAINER_TAG}) as a job output. - trigger-gitlab-security-scan: forward it as variables[CONTAINER_IMAGE] alongside the existing CONTAINER_TAG so GitLab can pull without having to know the registry/repo layout. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

debermudez

overall looks solid. thanks for the work!

…+ aarch64 install note (#991) Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

saturley-hall and others added 14 commits May 19, 2026 04:13

github-actions Bot added the fix label May 20, 2026

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Comment thread .github/workflows/nightly.yml Outdated

Comment thread .github/workflows/test-docs-end-to-end.yml

Comment thread .github/workflows/test-docs-end-to-end.yml

Comment thread tests/unit/config/test_resolvers.py

Comment thread tools/suggest_test_docs_weights.py Outdated

dynamo-ops reviewed May 20, 2026

View reviewed changes

Comment thread docs/tutorials/embeddings.md

Comment thread tests/ci/test_docs_end_to_end/main.py

saturley-hall and others added 2 commits May 20, 2026 01:37

saturley-hall requested a review from dynamo-ops May 20, 2026 06:00

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

dynamo-ops reviewed May 20, 2026

View reviewed changes

Comment thread .github/workflows/nightly.yml

saturley-hall requested a review from dynamo-ops May 20, 2026 10:57

saturley-hall and others added 3 commits May 20, 2026 08:44

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Comment thread .github/workflows/nightly.yml

Comment thread .github/workflows/resolve-check.yml

saturley-hall requested review from ajcasagrande and pvijayakrish May 20, 2026 18:46

saturley-hall enabled auto-merge (squash) May 21, 2026 01:20

debermudez approved these changes May 21, 2026

View reviewed changes

saturley-hall merged commit 83bdcd9 into main May 21, 2026
40 checks passed

saturley-hall deleted the harrison/fix-nightly branch May 21, 2026 03:23

lkomali mentioned this pull request May 22, 2026

fix(config): backport BurstGPT CSV + fixed_schedule fixes to release/0.9.0 #988

Merged

saturley-hall mentioned this pull request May 23, 2026

fix(build): narrow cherry-pick of PR #965 — uv required-environments + aarch64 install note #991

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: the nightly by fixing test docs end to end test suite and staging locations#965

fix: the nightly by fixing test docs end to end test suite and staging locations#965
saturley-hall merged 21 commits into
mainfrom
harrison/fix-nightly

saturley-hall commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning, 1 inconclusive)

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 20, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

saturley-hall commented May 20, 2026

coderabbitai Bot commented May 20, 2026

coderabbitai Bot left a comment

Uh oh!

Uh oh!

saturley-hall commented May 20, 2026

coderabbitai Bot commented May 20, 2026

debermudez left a comment

Uh oh!

Labels

3 participants

Uh oh!

Conversation

saturley-hall commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Try out this PR

github-actions Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 20, 2026

Codecov Report

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

saturley-hall commented May 20, 2026

coderabbitai Bot commented May 20, 2026

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

saturley-hall commented May 20, 2026

coderabbitai Bot commented May 20, 2026

debermudez left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

3 participants

saturley-hall commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

github-actions Bot commented May 20, 2026 •

edited

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading