fix: plot data loader 2#1004
Conversation
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@1dce25b38fce0117a1c2c5c3923a366e5cb03fe1Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@1dce25b38fce0117a1c2c5c3923a366e5cb03fe1Last updated for commit: |
|
Warning Review limit reached
More reviews will be available in 14 minutes and 46 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR adds model name metadata stamping to per-variation aggregates, unifies multi-format configuration reading in the data loader with aggregate-only fallbacks, and replaces vague availability checks in PNG exporters with detailed diagnostic reporting of missing data columns and sources. ChangesConfig Metadata Resolution and Diagnostic Reporting
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/aiperf/cli_runner/_sweep_aggregate.py`:
- Around line 456-458: The block resolving and setting the model should be
extracted from _export_one_variation_aggregate into a small helper to reduce
that function's length: create a function (e.g.,
_apply_resolved_model_to_aggregate or _set_resolved_model_for_variation) that
accepts (plan, key, aggregate_result), calls
_resolve_model_name_for_variation(plan, key), and if a value is returned sets
aggregate_result.metadata["model"] = resolved_model; then replace the three-line
block in _export_one_variation_aggregate with a single call to this new helper.
In `@tests/unit/test_cli_runner_sweep_helpers.py`:
- Around line 501-547: Rename the three test functions so their names follow the
test_<function>_<scenario>_<expected> convention and include the target function
name _resolve_model_name_for_variation; e.g., change
test_single_config_no_variations_returns_first_model to
test__resolve_model_name_for_variation_single_config_no_variations_returns_first_model,
change test_matches_variation_index_for_multi_config_plan to
test__resolve_model_name_for_variation_multi_config_matches_variation_index_returns_expected,
and change test_falls_back_to_first_config_when_label_unmatched to
test__resolve_model_name_for_variation_unmatched_label_falls_back_to_first_config
so they clearly reference _resolve_model_name_for_variation and the
scenario/expected outcome.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9f060782-3764-4744-a720-9723216501bc
📒 Files selected for processing (7)
src/aiperf/cli_runner/_sweep_aggregate.pysrc/aiperf/plot/core/data_loader.pysrc/aiperf/plot/exporters/png/multi_run.pysrc/aiperf/plot/exporters/png/single_run.pytests/unit/plot/test_data_loader.pytests/unit/plot/test_png_exporter.pytests/unit/test_cli_runner_sweep_helpers.py
The aggregate JSON written per sweep cell (`profile_export_aiperf_aggregate.json`) carries no `input_config` block, so the plot loader resolved `model = None` on those runs and the legend rendered "Unknown" — even after the YAML-v2/legacy fallback fix in #998, which only addresses the per-trial path. Write side (`_sweep_aggregate._export_one_variation_aggregate`): resolve the model from `plan.configs[variation.index].models.items[0]`, matching by label against the variation key. Stamp the result onto `aggregate_result.metadata["model"]` (alongside `variation_label`, `variation_values`, `sweep_mode`) so the aggregate file becomes self-describing. Fall back to `configs[0]` when the label is unmatched. Read side (`DataLoader._extract_metadata`): after the input_config lookup, if `model` is still None, pick it up from `aggregated["metadata"]["model"]`. Per-trial artifacts still win when both shapes are present. Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>
YAML v2 artifacts leave ``input_config.loadgen`` as ``null`` and attach per-phase load characteristics (``concurrency``, ``requests``) to the ``input_config.phases[]`` entries instead. The plot loader still only checked ``loadgen.concurrency`` / ``loadgen.request_count``, so v2 artifacts came back with ``RunMetadata.concurrency = None``. In ``MultiRunPNGExporter._runs_to_dataframe`` the ``or 1`` fallback collapsed every row to ``concurrency = 1``, fusing every cell into a single uncertainty point instead of one per swept value. Same class of shape-mismatch as the model name fix in #998: add a small helper that prefers the v2 ``phases[].{concurrency, requests}`` (read from the ``profiling`` phase when present, else the first phase) and falls back to the legacy ``loadgen.{concurrency, request_count}``. Defensive: ``loadgen`` may be the literal ``null`` (None) in v2 JSON, which would have crashed the prior ``"x" in config["loadgen"]`` form if a future change stops eliding null fields at serialization time. Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>
Previously, when ``_can_generate_plot`` returned False because a required metric column (multi-run) or data source (single-run) was missing, the skip was logged at ``debug`` level with no mention of what was missing. Users running with the default log level saw only ``Saved 0 plots`` and had no signal to diagnose. The TTFT-on-non- streaming-run case in particular has been silently dropping all plots. Refactor ``_can_generate_plot`` into ``_missing_required_columns`` (multi-run) and ``_missing_data_sources`` (single-run) so each exporter can name the missing items in the skip message. The skip log is now ``warning`` with the spec name, the offending columns/sources, and a hint pointing at the most common cause. Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
afb4fff to
0794980
Compare
…vention Per CLAUDE.md's ``test_<function>_<scenario>_<expected>`` rule, rename the three ``TestResolveModelNameForVariation`` methods so each name carries the target function (with leading double underscore to preserve the helper's own leading underscore) plus the scenario and expected outcome. No behavior change. Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>
…1005) Signed-off-by: Elias Bermudez <dbermudez@nvidia.com>
Summary by CodeRabbit
New Features
Improvements
Tests