Skip to content

fix: the nightly by fixing test docs end to end test suite and staging locations#965

Merged
saturley-hall merged 21 commits into
mainfrom
harrison/fix-nightly
May 21, 2026
Merged

fix: the nightly by fixing test docs end to end test suite and staging locations#965
saturley-hall merged 21 commits into
mainfrom
harrison/fix-nightly

Conversation

@saturley-hall

@saturley-hall saturley-hall commented May 20, 2026

Copy link
Copy Markdown
Member

Example execution

This dramatically lowers runtime by parallelizing the test suite across multiple executors in one job and dynamically balancing expected run time.

Differentiates the location of staging the container and wheels when manually executed (with an override to make it look like a nightly in extraordinary situations).

Closes OPS-6560

Summary by CodeRabbit

  • Infrastructure

    • CI workflows updated (action versions), improved nightly artifact/staging behavior with provenance, added a resolve-check workflow, and a non-blocking job that generates weight suggestions.
  • New Features

    • Sharded GPU test execution with shard controls for CI runners.
    • New CLI tool to analyze test runs and recommend per-command weights.
  • Refactor

    • Reworked packaging extras and tightened dependency resolution constraints.
  • Tests

    • Increased end-to-end timeouts; test harness and parser support per-command weights and shard slicing.
  • Documentation

    • Tutorials annotated with profiling weight parameters; README adds arm64 install note.

Review Change Stack

saturley-hall and others added 14 commits May 19, 2026 04:13
Test-docs end-to-end:
- src/aiperf/config/dataset/resolver.py — _check_timing_data now reports
  True for BURST_GPT_TRACE (matches the existing SAGEMAKER_DATA_CAPTURE
  branch); the loader's _REQUIRED_COLUMNS guarantees a Timestamp column,
  but the JSONL-only first-record probe always failed on CSV and blocked
  --fixed-schedule. New regression test in tests/unit/config/test_resolvers.py.
- docs/tutorials/embeddings.md — gets its own vllm-embeddings-openai
  server (setup + health-check + retagged run markers). The BGE
  commands previously routed to the Qwen3-0.6B chat server and 404'd
  on /v1/embeddings.
- tests/ci/test_docs_end_to_end/constants.py — AIPERF_COMMAND_TIMEOUT
  300 → 600s; the long-conversations / goodput / aimo tutorials run
  long on the AMD GPU runner and were SIGKILLed mid-flight.

AMD64 wheel-test:
- pyproject.toml — extract a [test] extra from [dev]; [dev] keeps
  tooling-only deps and recursively consumes [test]. Zero impact on
  regular pip install aiperf users (extras are opt-in).
- .github/workflows/nightly.yml — wheel-test step now installs
  "${NIGHTLY}[test]" instead of a hand-typed list; eliminates the drift
  that left jsonschema unlisted and broke collection of
  tests/unit/config/test_config_schema_generator_integration.py.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Temporarily redirect the workflow_dispatch / checkout / SHA-verification
gates from main to harrison/fix-nightly so the nightly-fix commit can be
exercised end-to-end before merge. Each modified block is tagged
`TEMPORARY` with a revert-before-merge note, and the dispatch-input
description echoes the same warning.

Note: GitHub reads the workflow file for `schedule:` triggers from the
default branch only, so the cron will not fire on this branch — runs
must be triggered via `gh workflow run nightly.yml --ref harrison/fix-nightly`.

Revert this commit before merging the PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
- stage-nightly-wheel uploads to nightly/${TIMESTAMP}/${FN} so wheels are
  grouped by date; TIMESTAMP and the wheel filename's embedded date come
  from the same prepare-nightly output, so they cannot drift.
- POST /api/pypi/<repo>/reindex after the deploy so Artifactory's PyPI
  simple index picks up the new wheel without manual recalc (generic
  PUT deploys bypass the metadata extractor).
- Pass ARTIFACTORY_WHEEL_PATH, ARTIFACTORY_NIGHTLY_WHEEL_PATH, TIMESTAMP
  to the GitLab security-scan trigger so downstream consumers get the
  exact location instead of having to know the layout convention.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
vllm-default-openai owns 39 of the ~53 aiperf commands extracted from
docs/tutorials/, and ran them sequentially in a single GPU job — ~57 min
of the previous 73-min wall-clock. Split it into 4 contiguous shards plus
one shard per modality-specific server (audio, vision, video, embeddings),
8 in total. Each shard runs on its own runner from the
prod-aiperf-tester-amd-gpu-v1 pool (maxRunners=16, podAntiAffinity by host
so each gets its own GPU and isolated localhost:8000), brings up its own
vLLM, executes its slice, and tears down.

Lives entirely in the called workflow (test-docs-end-to-end.yml), so PR /
workflow_dispatch / nightly's workflow_call all get the matrix.

- tests/ci/test_docs_end_to_end/main.py: new --server / --shard-index /
  --shard-total flags. Argparse rejects invalid combos. Contiguous-chunk
  slicing keeps at most one slow tutorial per shard (assuming the slow
  ones aren't clustered in the docs).
- tests/ci/test_docs_end_to_end/setup_test.sh: forwards SERVER_NAME /
  SHARD_INDEX / SHARD_TOTAL env vars from the matrix. Unset env vars
  preserve the existing --all-servers behavior for local invocation.
- .github/workflows/test-docs-end-to-end.yml: matrix of 8 shards,
  fail-fast: false. The aggregated job result that nightly.yml consumes
  is unchanged in shape (success iff all shards pass).

Expected wall-clock: ~22-25 min total, down from 73 min.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
The previous nightly's vllm-default-openai-1of4 shard hit the 600s
timeout on test 5 (aimo public-dataset) — the same test ran in 225s
in the prior serial run. Under the 8-way parallel matrix, shared
infra (HuggingFace API, ECR pulls, cold-start network contention)
slowed the HF-dataset-heavy tests by ~2.7x. Two complementary fixes:

1. Bin-pack commands by author-annotated weight instead of by docs
   order. Tag syntax now accepts an optional ``weight=<seconds>`` hint:
   `<!-- aiperf-run-...-endpoint-server weight=500 -->`. Parser extracts
   it (default 60s); the sharder runs LPT greedy bin-packing, sorting
   commands heaviest-first and assigning each to the currently-lightest
   shard. The three known-slow tests are tagged:
     - docs/tutorials/aimo.md: weight=500
     - docs/tutorials/goodput.md: weight=600
     - docs/tutorials/multi-turn.md (long-conversations): weight=1100
   At 4 shards on vllm-default-openai, this produces shard weights
   [1100, 1080, 1100, 1080] — long-conv gets its own shard; aimo and
   goodput each ride along with ~10 quick tests.

2. Raise AIPERF_COMMAND_TIMEOUT 600s -> 1200s as a safety net for
   parallel-load variance. The bin-packer should prevent any one shard
   from being over-weighted, but per-command runtime under contention
   is bursty (HF rate-limits, ECR throttling) and we'd rather wait than
   spuriously fail.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
GitHub is forcing Actions runners to Node 24 starting June 2nd, 2026
and removing Node 20 on September 16th, 2026. Bump every action that
GitHub flagged in the deprecation annotations to the version that
already targets Node 24. The same versions are already in use in
run-unit-tests.yml, so this is just bringing the rest of the
workflows into the same baseline.

- actions/checkout              @v4 -> @v6
- actions/setup-python   @v4/@v5 -> @v6
- actions/upload-artifact       @v4 -> @v5
- actions/download-artifact     @v4 -> @v5
- actions/setup-node            @v4 -> @v5  (fern-docs / run-fern-checks)
- astral-sh/setup-uv            @v5 -> @v7  (nightly only)

Other actions (github-script, codecov, pre-commit, docker/setup-buildx,
ytanikin/pr-conventional-commits) weren't on the deprecation list and
are left as-is.

Also: gate prepare-nightly's "Restore version artifact from prior
attempt" step on github.run_attempt > 1. The step is intentionally
written with continue-on-error: true so a missing artifact is fine on
fresh runs, but the action still emits a "Artifact not found" warning
annotation every time. Skipping on first attempts removes the noise;
continue-on-error stays as belt-and-suspenders for the case where a
prior attempt's artifact expired before re-run.

Files touched:
  .github/workflows/{nightly,pre-commit,test-docs-end-to-end,
                     fern-docs,run-fern-checks,run-integration-tests}.yml

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
…ft report

The first weighted-sharding nightly (run 26095921607) finished with shard
wall-clocks ranging from 12m to 40m+. Two distinct problems:

1. Declared weights drifted from reality. long-conv was tagged 1100s but
   ran 343s; goodput was tagged 600s but ran 243s; meanwhile blazedit_10k
   ran 700s with no weight at all, and burst-gpt-trace ran 544s with no
   weight. Shard 4 picked up the two heaviest unweighted tests and
   blew past the 10-min per-shard estimate by ~30 min.

2. fixed-schedule.md:122 failed in 1s when it landed in a different
   shard than fixed-schedule.md:58 — the second block uses
   precise_schedule.jsonl which the first block creates via cat-heredoc.
   Sequential test runs accidentally work because the same container
   working dir persists; sharding broke that assumption.

Changes:

- Rebalance weights from observed runtimes (observed × 1.3, rounded to 50):
  - blazedit.md:107 (blazedit_10k): default -> weight=900
  - burst-gpt-trace.md:63:           default -> weight=700
  - multi-turn.md:288 (long-conv):   weight=1100 -> weight=500
  - multi-turn.md:257 (conv 50):     default -> weight=350
  - aimo.md:53:                      weight=500 -> weight=350
  - goodput.md:43:                   weight=600 -> weight=350
  - custom-dataset.md:337 (random_pool): default -> weight=250
  - multi-turn.md:199 (conv 100):    default -> weight=250
  - sharegpt.md:38:                  default -> weight=200
  - synthetic-dataset.md:65:         default -> weight=200
  - blazedit.md:54 (blazedit_5k):    default -> weight=150
  - default weight bump: 60s -> 80s (mean of observed unweighted is ~95s)

- Make fixed-schedule.md:122 self-contained by inlining the
  precise_schedule.jsonl cat-heredoc, so the time-window example runs
  in any shard regardless of whether the basic example also landed there.

- Fix LPT non-determinism in tests/ci/test_docs_end_to_end/main.py:
  Path.rglob returns filesystem-dependent order (macOS vs Linux
  differ), so the same code assigned commands to different shards
  locally vs in CI. Add (file_path, start_line) as a stable
  secondary sort key after the weight-descending primary key.

- Add tools/suggest_test_docs_weights.py and a non-blocking
  suggest-shard-weights nightly job. The tool parses test-docs
  matrix logs, matches each test back to its markdown bash block,
  and writes a markdown table of suggested weight= updates to the
  GitHub Actions step summary. Purely informational — flagged by
  ratio>1.5 with abs delta >60s, by default-weighted tests
  observed >120s, and by tests over-weighted by 2x or more.
  Continues on error so it never blocks the pipeline.

Predicted 4-shard balance after rebalance: loads [1660, 1580, 1610, 1590],
spread 80s (down from ~1600s spread in run 26095921607).
Expected wall-clock per shard: ~34 min (boot ~7 min + work ~27 min),
floor pinned by the 900s blazedit_10k test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
The file has a `#!/usr/bin/env python3` shebang but git didn't track
the +x bit when the file was first added in dfc9082 (it was created
via the editor tool, which doesn't carry exec mode). pre-commit's
check-shebang-scripts-are-executable hook flagged this on the
2026-05-19 nightly run.

Fix: `git update-index --chmod=+x` to flip the bit in the git index.
Working-tree mode was already 0755 locally; this propagates that to
git so the hook is satisfied across fresh clones too.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
The suggest-shard-weights job already writes its markdown to
$GITHUB_STEP_SUMMARY, but step summaries only render on the run's
top-level page, not inside the per-job log view. Add two more
output paths so the report is discoverable no matter where someone
clicks:

1. Echo the report inside a collapsible ::group:: in the GHA log,
   so it's right there on the job-detail page.
2. Upload it as a `suggest-shard-weights` artifact with 30-day
   retention, so it can be downloaded via `gh run download` long
   after the run scrolls off the recent-runs list.

Same step summary content as before; the file is written to
suggest-weights.md in the workspace and then routed three ways.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Today every nightly run produces the same WHEEL_VERSION,
CONTAINER_TAG, and staging subpath regardless of trigger.
Same-day same-commit re-runs (manual or otherwise) clobber each
other in S3, Artifactory, ECR, and NGC — already documented as
"last-writer-wins" in the existing comments.

Disambiguate workflow_dispatch runs end-to-end:

1. PEP 440 local-version suffix on WHEEL_VERSION:
     0.8.0.dev<YYYYMMDD>+manual.<run_id>
   - Wheels are uniquely addressable per manual dispatch.
   - `pip install aiperf-nightly` keeps resolving the canonical
     scheduled wheel by default (non-local versions sort above
     identical base+local in PEP 440).
   - `pip install aiperf==<base>.dev<date>+manual.<run_id>`
     pulls a specific manual build.
   - .github/actions/update-pyproject-version already accepts
     local-version segments, no infra change needed there.

2. New STAGE_SUBPATH output from prepare-nightly:
     nightly/<YYYYMMDD>            for scheduled runs
     nightly-manual/<YYYYMMDD>/<run_id>  for manual dispatches
   Consumed by stage-nightly-wheel's Artifactory TARGET_URL and
   by trigger-gitlab-security-scan's ARTIFACTORY_WHEEL_PATH /
   ARTIFACTORY_NIGHTLY_WHEEL_PATH forwarded variables.
   S3 upload stays keyed on WHEEL_VERSION so the upload/download
   symmetry holds for free — the local-version suffix already
   distinguishes the paths.

3. CONTAINER_TAG branch in prepare-nightly:
     nightly-<YYYYMMDD>-<sha7>                for scheduled
     nightly-manual-<YYYYMMDD>-<sha7>-<run_id>  for manual
   No downstream edits needed; build-container, the multi-arch
   manifest job, stage-nightly-container, and the GitLab trigger
   all consume CONTAINER_TAG as-is.

4. NGC staging destination repo renames aiperf -> aiperf-nightly
   in stage-nightly-container (5 occurrences in the crane copy +
   index-append + delete lines). Mirrors the PyPI naming scheme
   (`aiperf` canonical dist / `aiperf-nightly` nightly-flavored
   dist). ECR source side stays `aiperf` — purely an NGC-side
   rename. Operational pre-req: the `aiperf-nightly` NGC repo
   must exist with publish-service-account write access before
   the next staged dispatch.

Also: remove the explicit Artifactory PyPI /reindex POST that was
returning 403 with the current ARTIFACTORY_USER. The user has
Deploy permission but not Manage Repository; widening that grant
is tracked separately. Until then, simple-index visibility for
freshly-uploaded wheels relies on the repo's auto-reindex
mechanism or an out-of-band UI recalc.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Default for manual workflow_dispatch runs is unchanged: artifacts get
the +manual.<run_id> wheel suffix, nightly-manual/<date>/<run_id>/
staging path, and nightly-manual-<date>-<sha>-<run_id> container tag
(everything we just added in b7fabee).

Adds an opt-out: setting `manual_artifacts=false` on a manual dispatch
falls back to the bare scheduled-nightly naming (no local-version
suffix, plain `nightly/<date>/` path, plain `nightly-<date>-<sha>`
container tag). Useful for emergency re-publishes where a manual run
is *meant* to overwrite the scheduled artifact locations — e.g.
re-running today's nightly because the cron-triggered build failed.

Schedule events ignore the input entirely; the GITHUB_EVENT_NAME check
short-circuits before MANUAL_ARTIFACTS is consulted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Use Artifactory's matrix-parameter syntax on the PUT URL to attach
two properties to every wheel staged by stage-nightly-wheel:

  github.run.url = https://github.com/<org>/<repo>/actions/runs/<run_id>
  git.commit.sha = <full commit SHA the wheel was built from>

Both properties are visible in the Artifactory UI under each wheel's
Properties tab, included in Xray scan metadata, and queryable via the
AQL API ("show me every wheel built from SHA X" or "show me every
wheel from yesterday's nightly run").

URL value is URL-encoded with python3 (always available on the
builder runner) so the ':' and '/' characters in the GitHub Actions
URL don't confuse Artifactory's matrix-parameter parser.

Path URL (without properties) is still what gets logged in the
'Wheel staged to Artifactory:' line for readability — the properties
appear as a separate log line below.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
yarl is transitive via aiohttp, but the Dockerfile's env-builder
runs a cold `uv sync` (no --frozen, no uv.lock COPY) so resolution
is fresh against PyPI on every build. That makes us vulnerable to
the few-hours window after upstream pushes a new release with
incomplete wheel coverage.

Specifically today: yarl 1.24.0 uploaded at 17:23 UTC with only
cp310 wheels — no cp311/12/13, no sdist. Our env-builder runs on
python:3.13-slim-bookworm, so uv picked 1.24.0 (the new "latest"
per PyPI metadata) and bailed with "doesn't have a source
distribution or wheel for the current platform", taking out every
test-docs-end-to-end shard for the run.

Direct constraint on a transitive dep is unusual but legit — the
resolver honors it during `uv sync` regardless of who pulls yarl
in. `<2.0` keeps us on the current major; the comment in
pyproject.toml flags it as removable once upstream's release
pipeline becomes reliable about uploading all platform wheels
atomically.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
The prepare-nightly job's version-info file changes extension from
.env to .txt to match what it actually is — a human-readable summary
of the resolved version strings for the run. The format stays as
shell-source-able KEY='value' lines so the same file restores
correctly on re-attempts.

Two visibility wins:

1. The contents are echoed into a collapsible "Resolved version info"
   group in the workflow log right after they're computed. No need to
   download the artifact (or click into a sub-job) to see what
   WHEEL_VERSION / STAGE_SUBPATH / CONTAINER_TAG resolved to.

2. The archived artifact (`nightly-version`) now contains
   version-info.txt — same file, more honest extension. Downstream
   tools that grep / `cat` the artifact don't care about the
   extension; tools that `source` it still work because the format
   didn't change.

Migration safety: the restore step probes for version-info.txt
first, then falls back to version-info.env. Re-attempts of runs
started before this change still find their snapshot.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@15e3fb41cb1d0226eb5f355e5e6da439afd7f7ed

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@15e3fb41cb1d0226eb5f355e5e6da439afd7f7ed

Last updated for commit: 15e3fb4Browse code

@github-actions github-actions Bot added the fix label May 20, 2026
@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
@coderabbitai

coderabbitai Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds weight-based end-to-end test sharding and a weight-suggestion tool, annotates tutorials with weight markers, updates GitHub Action versions across workflows, and enhances the nightly workflow with manual-artifact naming, stage_subpath outputs, provenance metadata, and staged wheel/container changes.

Changes

End-to-End Test Sharding and Weight System

Layer / File(s) Summary
Weight Data Model and Tutorial Directive Parsing
tests/ci/test_docs_end_to_end/data_types.py, tests/ci/test_docs_end_to_end/parser.py
Command dataclass gains a weight: int field (default 80), and the markdown parser extracts optional weight=<int> attributes from HTML comment directives to populate command weights.
Test Sharding Algorithm
tests/ci/test_docs_end_to_end/main.py
Argument parser adds mutually exclusive --all-servers / --server groups and --shard-index/--shard-total parameters. When sharding is active, a longest-processing-time greedy bin-packing algorithm partitions commands by weight across shard bins, selects the target shard's subset, and restores documentation order.
Workflow Matrix and Shell Integration
.github/workflows/test-docs-end-to-end.yml, tests/ci/test_docs_end_to_end/setup_test.sh
Workflow matrix expands test job to 4 shards for vllm-default-openai and single shards for other servers. Environment variables (SERVER_NAME, SHARD_INDEX, SHARD_TOTAL) are passed from the matrix to setup_test.sh, which dynamically constructs python3 main.py arguments based on those values.
Detect-Changes Hash and Timeouts
tests/ci/test_docs_end_to_end/dump_config_hash.py, tests/ci/test_docs_end_to_end/constants.py
Includes weight in detect-changes hash (normalizing commands as [command, weight] tuples) and increases AIPERF_COMMAND_TIMEOUT to 1200 seconds for end-to-end test commands.
Dataset Resolver and Unit Test
src/aiperf/config/dataset/resolver.py, tests/unit/config/test_resolvers.py
DatasetResolver._check_timing_data treats BURST_GPT_TRACE as having timing data and a unit test verifies BurstGPT CSV timing detection and fixed-schedule validation.

Weight Suggestion Analysis Tool

Layer / File(s) Summary
Suggestion tool implementation
tools/suggest_test_docs_weights.py
New CLI tool fetches GitHub Actions workflow logs, parses timestamped AIPerf test events, matches them to tutorial commands via canonicalized bash text, computes suggested weight= updates using an observed-time safety margin, renders a markdown report, and integrates with CI summaries/artifacts.

Tutorial Documentation Weight Annotations

Layer / File(s) Summary
Weight Markers for Tutorial Examples
docs/tutorials/aimo.md, docs/tutorials/blazedit.md, docs/tutorials/burst-gpt-trace.md, docs/tutorials/custom-dataset.md, docs/tutorials/goodput.md, docs/tutorials/multi-turn.md, docs/tutorials/sharegpt.md, docs/tutorials/synthetic-dataset.md
Adds or updates weight= parameters in embedded aiperf-run-vllm-default-openai-endpoint-server directives across tutorial examples (weights range from 150 to 900).
Specialized Tutorial Updates
docs/tutorials/embeddings.md, docs/tutorials/fixed-schedule.md
Embeddings tutorial switches to embeddings-specific snippet markers and replaces a simple readiness check with a polling loop waiting for /v1/embeddings HTTP 200; the fixed-schedule tutorial inlines precise_schedule.jsonl generation within the code block for a self-contained example.

CI Action Updates and Nightly Workflow Enhancements

Layer / File(s) Summary
GitHub Actions Version Bumps
.github/workflows/fern-docs.yml, .github/workflows/pre-commit.yml, .github/workflows/run-fern-checks.yml, .github/workflows/run-integration-tests.yml, .github/workflows/nightly.yml, .github/workflows/test-docs-end-to-end.yml
Updates actions/checkout (v4→v6), actions/setup-python (v4/v5→v6), actions/setup-node (v4→v5), and related actions across multiple workflows.
Nightly Workflow Dispatch and Manual Artifacts
.github/workflows/nightly.yml
Adds workflow_dispatch.inputs.manual_artifacts (default true), computes distinct naming/staging (nightly-manual/<YYYYMMDD>/<run_id>/ and +manual.<run_id> suffix) for manual runs when enabled, and exports prepare-nightly.outputs.stage_subpath.
Version Info Persistence and Artifact Format
.github/workflows/nightly.yml
Restores/sources version-info.txt with fallback, uploads nightly-version via actions/upload-artifact@v5, logs resolved version info, and persists .nightly-version/version-info.txt.
Artifact Staging and Provenance
.github/workflows/nightly.yml
Stages container images to aiperf-nightly (sourced from ECR aiperf), deploys wheels under ${STAGE_SUBPATH} with appended provenance properties (github.run.url, git.commit.sha), and adjusts delete/index/tag behavior.
GitLab Security Scan Integration
.github/workflows/nightly.yml
Passes explicit TIMESTAMP, ARTIFACTORY_WHEEL_PATH, and ARTIFACTORY_NIGHTLY_WHEEL_PATH (derived from STAGE_SUBPATH) to the GitLab trigger payload.
Nightly Suggest Weights Job
.github/workflows/nightly.yml
Adds a non-blocking suggest-shard-weights job that runs the suggestion tool, appends the report to the job summary, and uploads suggest-weights.md as an artifact.

Configuration and Test Infrastructure

Layer / File(s) Summary
Dependency Restructuring and UV config
pyproject.toml
Adds a test optional-dependencies group and a [tool.uv] required-environments block to influence uv resolution across target platforms.
Resolve Check Workflow
.github/workflows/resolve-check.yml
Adds a Resolve Check workflow that runs uv sync --active --no-install-project on Python 3.13 to validate dependency resolution against required-environments.
README Note
README.md
Adds a Linux aarch64 installation note about needing a C build toolchain for the crick sdist dependency.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through logs by moonlit beams,

Weights now guide the test-run dreams;
Shards align like carrots in a row,
Nightly wheels with provenance aglow,
Tutorials tuned — hop, tweak, and go!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'fix: the nightly by fixing test docs end to end test suite and staging locations' references real aspects of the PR (test docs end-to-end suite, staging locations) but uses awkward phrasing ('fix: the nightly by fixing') that obscures the main intent and is not concise. Clarify the title to better convey the main objective. Consider: 'fix: parallelize and balance test-docs end-to-end suite with dynamic weights' or 'fix: add sharding to test-docs e2e and separate manual staging locations'.
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
.github/workflows/nightly.yml (1)

1042-1084: 💤 Low value

Consider adding persist-credentials: false for defense in depth.

The job design is solid: non-blocking with continue-on-error: true, good visibility via log group + step summary + artifact, and correct if condition to skip when the upstream job is skipped.

Static analysis flags the checkout step (Line 1056) for missing persist-credentials: false. While this job only reads data and poses low risk, adding the flag is a low-effort hardening:

🛡️ Suggested hardening
       - uses: actions/checkout@v6
         with:
           ref: ${{ needs.prepare-nightly.outputs.resolved_sha }}
+          persist-credentials: false
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/nightly.yml around lines 1042 - 1084, The actions/checkout
step in the suggest-shard-weights job should include persist-credentials: false
to avoid leaving repository credentials in the runner workspace; update the
checkout step that uses actions/checkout@v6 (inside the suggest-shard-weights
job) to add with: persist-credentials: false while keeping the existing ref: ${{
needs.prepare-nightly.outputs.resolved_sha }} entry so the job still checks out
the correct commit.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/nightly.yml:
- Around line 93-122: Revert the three TEMPORARY branch-gate changes so the
workflow targets main: change the "Require dispatch from harrison/fix-nightly"
step's conditional/message to require/main reference to main, update the
"Checkout" step so its ref uses inputs.sha || 'main' instead of
'harrison/fix-nightly', and update the "Verify SHA is on harrison/fix-nightly"
step to verify against origin/main (and adjust its step name/message
accordingly); ensure the semantics (rejecting dispatches not on the target
branch and verifying merge-base) remain the same after switching to main.

In @.github/workflows/test-docs-end-to-end.yml:
- Line 39: Replace the mutable tag usages of the checkout action with pinned
immutable commit SHAs: locate every occurrence of "uses: actions/checkout@v6"
(the three instances referenced in the diff) and replace each with the
corresponding full commit SHA form "uses: actions/checkout@<full-commit-sha>";
ensure you use the exact commit SHA for the version you intend to pin and update
all three occurrences consistently so the workflow satisfies the immutable-SHA
policy.
- Around line 39-42: Update both uses of the actions/checkout@v6 steps (the
checkout that performs read-only git operations and the subsequent checkout
before tests) to explicitly set persist-credentials: false; locate the two
occurrences of "uses: actions/checkout@v6" in the workflow and add the key
"persist-credentials: false" under their "with:" blocks so credentials are not
persisted to the workspace.

In `@tests/unit/config/test_resolvers.py`:
- Line 499: Rename the test function test_burst_gpt_csv_reports_timing_data to
follow the repo convention test_<function>_<scenario>_<expected> and add an
explicit return type annotation -> None; for example rename
test_burst_gpt_csv_reports_timing_data to
test_burst_gpt_csv_reports_timing_data_records_timing and change its signature
to include -> None so the function definition
(test_burst_gpt_csv_reports_timing_data / new name) matches the test naming and
type-hinting guidelines.

In `@tools/suggest_test_docs_weights.py`:
- Around line 239-247: The current mapping by_bash built with
{_canonicalize_bash(c.command): c for c in commands} loses duplicates because
later commands overwrite earlier ones, causing ambiguous bash-to-command
matches; update the logic that builds by_bash to collect all Command entries per
canonicalized bash (e.g., map string -> list[Command] or detect duplicates) and
then update the matching loop over runs to handle multiple candidates: when
by_bash returns multiple Commands for r.bash_content, disambiguate using
additional metadata (file, line, tutorial id) or skip/record as ambiguous
instead of silently overwriting, ensuring Suggestion creation uses the correct
Command instance(s) and increments matched/unmatched appropriately; refer to
_canonicalize_bash, by_bash, runs, commands, and Suggestion to locate where to
change the collection and matching logic.

---

Nitpick comments:
In @.github/workflows/nightly.yml:
- Around line 1042-1084: The actions/checkout step in the suggest-shard-weights
job should include persist-credentials: false to avoid leaving repository
credentials in the runner workspace; update the checkout step that uses
actions/checkout@v6 (inside the suggest-shard-weights job) to add with:
persist-credentials: false while keeping the existing ref: ${{
needs.prepare-nightly.outputs.resolved_sha }} entry so the job still checks out
the correct commit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f981486f-3d02-4e30-a4f8-3dd03eaa4f89

📥 Commits

Reviewing files that changed from the base of the PR and between bb6f421 and 686c734.

📒 Files selected for processing (25)
  • .github/workflows/fern-docs.yml
  • .github/workflows/nightly.yml
  • .github/workflows/pre-commit.yml
  • .github/workflows/run-fern-checks.yml
  • .github/workflows/run-integration-tests.yml
  • .github/workflows/test-docs-end-to-end.yml
  • docs/tutorials/aimo.md
  • docs/tutorials/blazedit.md
  • docs/tutorials/burst-gpt-trace.md
  • docs/tutorials/custom-dataset.md
  • docs/tutorials/embeddings.md
  • docs/tutorials/fixed-schedule.md
  • docs/tutorials/goodput.md
  • docs/tutorials/multi-turn.md
  • docs/tutorials/sharegpt.md
  • docs/tutorials/synthetic-dataset.md
  • pyproject.toml
  • src/aiperf/config/dataset/resolver.py
  • tests/ci/test_docs_end_to_end/constants.py
  • tests/ci/test_docs_end_to_end/data_types.py
  • tests/ci/test_docs_end_to_end/main.py
  • tests/ci/test_docs_end_to_end/parser.py
  • tests/ci/test_docs_end_to_end/setup_test.sh
  • tests/unit/config/test_resolvers.py
  • tools/suggest_test_docs_weights.py
Comment thread .github/workflows/nightly.yml Outdated
Comment thread .github/workflows/test-docs-end-to-end.yml
Comment thread .github/workflows/test-docs-end-to-end.yml
Comment thread tests/unit/config/test_resolvers.py
Comment thread tools/suggest_test_docs_weights.py Outdated
@codecov

codecov Bot commented May 20, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Comment thread docs/tutorials/embeddings.md
Comment thread tests/ci/test_docs_end_to_end/main.py
saturley-hall and others added 2 commits May 20, 2026 01:37
The 5600262 commit on this branch pointed nightly.yml's dispatch
gates at harrison/fix-nightly so we could exercise the full
pipeline end-to-end against the in-flight branch. Now that the PR
is passing CI and getting merged, restore the gates to main:

  - workflow_dispatch input description for `sha` says "Must be on
    main" again.
  - "Require dispatch from main" check rejects dispatches from any
    other ref.
  - Checkout default ref goes back to 'main'.
  - "Verify SHA is on main" runs merge-base against origin/main.

Net effect after merge: scheduled cron continues to build tip of
main; manual dispatches must come from main and pass an
ancestor check against origin/main. Behavior is identical to
pre-56002626 baseline.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
1. tools/suggest_test_docs_weights.py — by_bash dedup
   The previous {_canonicalize_bash(c.command): c for c in commands}
   silently dropped all-but-one of any markdown commands sharing
   canonicalized bash. No tutorials collide today but a future
   duplicate would be invisible. Switch by_bash to dict[str,
   list[Command]] (defaultdict + append), log an INFO line naming
   every colliding markdown location at collection time, and emit
   one Suggestion per matched Command in the loop. Matched/unmatched
   counts still increment per-run (one TestRun = one or zero
   matches), but the suggestion list can carry > one row per run
   when bash collides — appropriate because all matched markdown
   locations need the same weight update.

2. tests/ci/test_docs_end_to_end/dump_config_hash.py — include weight
   The detect-changes hash skipped the GPU e2e workflow whenever a
   PR touched only weight= annotations, even though shard
   assignment depends on weight. Hash each aiperf_command as
   [c.command, c.weight] so weight-only edits invalidate the hash
   and trigger the e2e job. Verified locally: flipping
   weight=350 -> weight=351 on one tutorial changes the hash.

3. docs/tutorials/embeddings.md — combine cat + aiperf into one block
   The previous structure put the aiperf-run tag on a `cat > inputs.jsonl`
   block followed by a separate ungtagged bash block with `aiperf
   profile`. The parser only grabs the FIRST bash block after a tag,
   so CI was running the cat (a no-op) and reporting success without
   ever exercising BGE. Merge into one tagged block so the runner
   actually invokes aiperf profile against the BGE embedding
   endpoint. Default weight (80s) is sufficient: the companion
   synthetic-BGE test in the same file is also default-weighted and
   the whole vllm-embeddings-openai shard finishes in 7m02s
   end-to-end; this custom-input variant uses --request-count 5 so
   it's strictly smaller than its sibling.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
@saturley-hall saturley-hall requested a review from dynamo-ops May 20, 2026 06:00

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tools/suggest_test_docs_weights.py (1)

259-305: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Aggregate duplicate-bash runs before emitting suggestions.

When the same canonicalized bash appears in multiple markdown blocks and both blocks run, Lines 259-305 emit suggestions for every (run, command) pair. That turns two real executions into four rows and can give the same file conflicting suggested weights if the runtimes differ. Collapse runs to one observed value per bash_content before iterating cmds (for example, keep the max observed runtime), then derive matched from that deduped set so the summary stays consistent.

Proposed fix
-    matched = 0
     unmatched = 0
-    suggestions: list[Suggestion] = []
+    observed_by_bash: dict[str, TestRun] = {}
     for r in runs:
         cmds = by_bash.get(r.bash_content, [])
         if not cmds:
             unmatched += 1
             log.debug("No markdown match for test %d in %s", r.test_index, r.job_name)
             continue
-        matched += 1
         if r.status == "failed":
             # A 1-second-fail test gives us no useful timing; skip.
             continue
+        prev = observed_by_bash.get(r.bash_content)
+        if prev is None or r.actual_seconds > prev.actual_seconds:
+            observed_by_bash[r.bash_content] = r
 
-        # When several markdown blocks share canonicalized bash, the run's
-        # runtime applies to all of them — they need the same weight, so
-        # emit one Suggestion per Command pointing at each markdown location.
-        for cmd in cmds:
+    matched = sum(len(by_bash[bash]) for bash in observed_by_bash)
+    suggestions: list[Suggestion] = []
+    for bash_content, r in observed_by_bash.items():
+        # Emit one Suggestion per markdown location using the selected runtime
+        # for this canonicalized bash block.
+        for cmd in by_bash[bash_content]:
             current = cmd.weight
             observed = r.actual_seconds
             suggested = _round_to(int(observed * SAFETY_MARGIN))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/suggest_test_docs_weights.py` around lines 259 - 305, The current loop
emits a Suggestion for every (run r, Command cmd) pair causing duplicate
suggestions when the same r.bash_content appears in multiple runs; instead,
aggregate runs by bash_content (the by_bash key) before emitting suggestions:
build a map from bash_content to a single observed_seconds (e.g., the max of
r.actual_seconds across runs with that bash_content) and a representative run
metadata (job_id, job_name, passed_line_index) and compute matched/unmatched
from that deduped set, then iterate cmds = by_bash[bash_content] and use the
aggregated observed_seconds (and the representative run for anchor) when
computing observed, suggested, ratio, and appending Suggestion objects (keeping
references to DEFAULT_WEIGHT, SAFETY_MARGIN, default_threshold, threshold,
suggestions, Suggestion, cmd.start_line, cmd.file_path as in the diff).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tools/suggest_test_docs_weights.py`:
- Around line 259-305: The current loop emits a Suggestion for every (run r,
Command cmd) pair causing duplicate suggestions when the same r.bash_content
appears in multiple runs; instead, aggregate runs by bash_content (the by_bash
key) before emitting suggestions: build a map from bash_content to a single
observed_seconds (e.g., the max of r.actual_seconds across runs with that
bash_content) and a representative run metadata (job_id, job_name,
passed_line_index) and compute matched/unmatched from that deduped set, then
iterate cmds = by_bash[bash_content] and use the aggregated observed_seconds
(and the representative run for anchor) when computing observed, suggested,
ratio, and appending Suggestion objects (keeping references to DEFAULT_WEIGHT,
SAFETY_MARGIN, default_threshold, threshold, suggestions, Suggestion,
cmd.start_line, cmd.file_path as in the diff).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0cddbf57-57ea-4226-b823-865a8feaca06

📥 Commits

Reviewing files that changed from the base of the PR and between f9a3a02 and bdac536.

📒 Files selected for processing (3)
  • docs/tutorials/embeddings.md
  • tests/ci/test_docs_end_to_end/dump_config_hash.py
  • tools/suggest_test_docs_weights.py
Comment thread .github/workflows/nightly.yml
@saturley-hall saturley-hall requested a review from dynamo-ops May 20, 2026 10:57
saturley-hall and others added 3 commits May 20, 2026 08:44
…hots

The compute step writes STAGE_SUBPATH on $GITHUB_OUTPUT (line 220),
but legacy version-info.env files — from before the
scheduled/manual split — don't define it. Under `set -u` (the step
uses `set -euo pipefail`) the unbound-variable expansion aborts the
job before any other output is emitted, breaking re-attempts of
nightlies started before the rename.

Fix: after sourcing a snapshot, check whether STAGE_SUBPATH is set
and default it to `nightly/${TIMESTAMP}` if not. All legacy
artifacts physically landed under that bare path (the split didn't
exist yet), so the backfill matches what's actually in S3 /
Artifactory / NGC.

Verified locally by sourcing a synthesized legacy env file under
`set -euo pipefail` — the new conditional populates STAGE_SUBPATH
and the step proceeds normally.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
The .env -> .txt rename added an elif fallback and a STAGE_SUBPATH
backfill so a "legacy" snapshot wouldn't crash the restore step.
That code is unreachable in practice: GitHub Actions reruns always
execute the same workflow file the original run was triggered with,
so an artifact written by this version of the code is always read
back by this same version. No mixed-extension or
missing-STAGE_SUBPATH scenario can actually occur in CI.

Strip both pieces of defense before they get copy-pasted into
something else. Restore is back to a single `if -f .txt` branch
with no fallbacks; the file format is still shell-`source`-able
so sourcing works identically across the extension change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Two failed nightlies in 24 hours from upstream partial releases:
yarl 1.24.0 (only cp310 wheels, no sdist) on 2026-05-19, then
greenlet 3.5.1 (no linux/aarch64 wheel, no sdist) on 2026-05-20.
Both broke env-builder's cold `uv sync` because the resolver picked
the new "latest" version, found no compatible distribution for our
build platform, and bailed. Range pinning each offender as it
appears is whack-a-mole; the next binary-extension transitive that
ships a partial release does it again.

Systemic fix is what uv itself suggested in the greenlet error
message:

  hint: consider adding "sys_platform == 'linux' ..." to
        tool.uv.required-environments

`required-environments` makes the resolver refuse any version that
lacks a usable distribution (wheel OR sdist) on every listed
platform, and silently backtracks until it finds one that does.
Forward-compatible: as upstream completes their uploads, the
resolver picks the newer version up automatically.

Changes:

- pyproject.toml: add [tool.uv] required-environments covering
  Linux x86_64 / aarch64 (the build-container matrix) plus
  macOS arm64 (developer laptops). crick remains sdist-only on
  Linux aarch64 and passes the constraint via its sdist (env-builder
  compiles it from source today, unchanged behavior).
- pyproject.toml: drop the now-redundant `yarl>=1.23,<2.0` direct
  pin + its multi-line comment. required-environments handles it.
- .github/workflows/nightly.yml: new `resolve-check` job on
  prod-aiperf-default-v1 after prepare-nightly that runs the
  resolver only (no install) and gates build-container on success.
  Fail-to-feedback drops from ~10 min (env-builder docker stage)
  to ~1 min.
- .github/workflows/resolve-check.yml: new dedicated workflow that
  runs the same check on pushes to main / pull-request/* / release/*
  branches so PR authors see partial-release failures fast and
  explicitly instead of buried in `make first-time-setup`.
- README.md: surface the pre-existing crick / aarch64 install
  prereq (system C compiler needed) before users discover it via
  cryptic setuptools errors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
@saturley-hall

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/nightly.yml (1)

1134-1134: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use resolved SHA in summary instead of github.sha.

For manual dispatch builds, ${{ github.sha }} can differ from the built commit (inputs.sha). Use ${{ needs.prepare-nightly.outputs.resolved_sha }} to keep provenance accurate.

Suggested patch
-            echo "| Commit | \`${{ github.sha }}\` |"
+            echo "| Commit | \`${{ needs.prepare-nightly.outputs.resolved_sha }}\` |"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/nightly.yml at line 1134, The summary currently prints the
built commit using `${{ github.sha }}` which can be incorrect for manual
dispatch; update the echo that writes the Commit line so it uses the resolved
SHA output from the prepare job i.e. replace `${{ github.sha }}` with `${{
needs.prepare-nightly.outputs.resolved_sha }}` (refer to the echo statement that
prints "| Commit | \`${{ github.sha }}\` |" in the nightly.yml workflow) to
ensure the summary records the actual built commit provenance.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/nightly.yml:
- Around line 368-373: Update the resolve-check step to harden external actions:
replace floating tags with pinned commit SHAs for actions/checkout,
astral-sh/setup-uv, and actions/setup-python (e.g., use actions/checkout@<sha>,
astral-sh/setup-uv@<sha>, actions/setup-python@<sha>) and add
persist-credentials: false to the actions/checkout step so the checkout does not
leave a token in the runner; ensure you update the existing lines referencing
actions/checkout@v6, astral-sh/setup-uv@v7, and actions/setup-python@v6
accordingly.

In @.github/workflows/resolve-check.yml:
- Around line 36-38: Update the actions/checkout step that uses "uses:
actions/checkout@v6" to explicitly disable credential persistence by adding
"persist-credentials: false" to that checkout step (unless a later workflow step
requires authenticated git operations); this limits exposure of the GITHUB_TOKEN
while leaving the other pinned actions (astral-sh/setup-uv@v7 and
actions/setup-python@v6) unchanged.

---

Outside diff comments:
In @.github/workflows/nightly.yml:
- Line 1134: The summary currently prints the built commit using `${{ github.sha
}}` which can be incorrect for manual dispatch; update the echo that writes the
Commit line so it uses the resolved SHA output from the prepare job i.e. replace
`${{ github.sha }}` with `${{ needs.prepare-nightly.outputs.resolved_sha }}`
(refer to the echo statement that prints "| Commit | \`${{ github.sha }}\` |" in
the nightly.yml workflow) to ensure the summary records the actual built commit
provenance.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a6957663-0797-454f-b541-f6315c871c9a

📥 Commits

Reviewing files that changed from the base of the PR and between fdc46e0 and 2456525.

📒 Files selected for processing (4)
  • .github/workflows/nightly.yml
  • .github/workflows/resolve-check.yml
  • README.md
  • pyproject.toml
Comment thread .github/workflows/nightly.yml
Comment thread .github/workflows/resolve-check.yml
Add persist-credentials: false to both new resolve-check checkout
steps (the dedicated PR-trigger workflow and the nightly job).
Neither step does any authenticated git operations after the
checkout, so the default credential persistence just leaves the
GITHUB_TOKEN in the runner's .git/config for the rest of the job
unnecessarily.

CodeRabbit's other suggestion — pin the three actions
(actions/checkout, astral-sh/setup-uv, actions/setup-python) to
commit SHAs instead of major-version tags — is intentionally not
applied here. Every other workflow in .github/workflows/ uses
major-version tags, and CodeRabbit's first comment even noted
"The version-pinned actions follow the repository's established
pattern across all workflows." Pinning just these three lines to
SHAs would be inconsistent and create a one-off maintenance
surface; SHA-pinning policy, if adopted, should be a repo-wide
change in a separate PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
@saturley-hall

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…dant nightly- tag prefix

- prepare-nightly: strip nightly-/nightly-manual- from CONTAINER_TAG (NGC
  repo aiperf-nightly already conveys this; RELEASE_TYPE carries the
  cron-vs-dispatch signal).
- stage-nightly-container: expose the staged multi-arch manifest reference
  (${NGC_STAGING_IMAGE_BASE}/aiperf-nightly:${CONTAINER_TAG}) as a job output.
- trigger-gitlab-security-scan: forward it as variables[CONTAINER_IMAGE]
  alongside the existing CONTAINER_TAG so GitLab can pull without having to
  know the registry/repo layout.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
@saturley-hall saturley-hall enabled auto-merge (squash) May 21, 2026 01:20

@debermudez debermudez left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks solid. thanks for the work!

@saturley-hall saturley-hall merged commit 83bdcd9 into main May 21, 2026
40 checks passed
@saturley-hall saturley-hall deleted the harrison/fix-nightly branch May 21, 2026 03:23
saturley-hall added a commit that referenced this pull request May 23, 2026
…+ aarch64 install note (#991)

Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 participants