feat: Add parameter sweeping functionality#1
Conversation
5052196 to
052264b
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds parameter-sweep support across CLI, configs, orchestrator, aggregation (Pareto/trends), exporters (JSON/CSV), models, and extensive tests; introduces ParameterSweepStrategy, sweep-aware validators, SweepAggregation, exporters, and multi-run orchestration changes. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (5)
docs/tutorials/parameter-sweeping.md (1)
43-43: Consider adding language specifiers to fenced code blocks.Several fenced code blocks for output examples lack language specifiers. While these are plain text outputs rather than code, adding
textor leaving them as-is is acceptable for documentation. This is a minor linting suggestion.Also applies to: 74-74, 101-101, 146-146, 164-164, 188-188, 389-389
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/tutorials/parameter-sweeping.md` at line 43, Several fenced code blocks that show output examples are missing language specifiers; locate the bare triple-backtick blocks (``` ) used for output examples in parameter-sweeping.md and change them to include a language specifier like ```text (or ```output) so the blocks are explicitly marked as plain text; update each instance where a fenced block is used for output (the bare ``` occurrences) to use ```text for consistency.src/aiperf/common/config/user_config.py (1)
337-365: Useget_sweep_parameter()for sweep detection.
Keeps incompatibility checks aligned with parameter‑agnostic sweep detection as additional sweepable parameters are introduced.♻️ Suggested change
- is_sweep = isinstance(self.loadgen.concurrency, list) + is_sweep = self.loadgen.get_sweep_parameter() is not None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiperf/common/config/user_config.py` around lines 337 - 365, The validate_sweep_incompatibilities method currently detects sweeps by checking isinstance(self.loadgen.concurrency, list); replace that check with a call to get_sweep_parameter() so sweep detection is parameter‑agnostic as more sweepable parameters are added. Update validate_sweep_incompatibilities to call get_sweep_parameter() (or store its result) and use its truthiness to drive the existing incompatibility checks (including the fixed-schedule check and the _should_use_fixed_schedule_for_mooncake_trace() branch) so all sweep-related logic relies on the unified sweep detector.tests/unit/orchestrator/aggregation/test_sweep.py (1)
124-139: Consider consolidating function imports.Throughout this file,
identify_pareto_optimal,analyze_trends, andSweepAggregationare imported inside each test method. While this keeps tests self-contained, it adds verbosity. Consider adding these to the top-level imports alongside the existing ones.♻️ Suggested import consolidation
from aiperf.orchestrator.aggregation import ( DEFAULT_PARETO_OBJECTIVES, Objective, OptimizationDirection, + SweepAggregation, + analyze_trends, + identify_pareto_optimal, )Then remove the per-method imports like
from aiperf.orchestrator.aggregation import identify_pareto_optimal.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/orchestrator/aggregation/test_sweep.py` around lines 124 - 139, Tests import identify_pareto_optimal, analyze_trends, and SweepAggregation inside individual test functions which is verbose; consolidate by adding a single top-level import statement for these symbols (identify_pareto_optimal, analyze_trends, SweepAggregation) alongside the existing module imports at the top of the test file and remove the per-test inline imports so each test simply uses the functions/classes directly.src/aiperf/orchestrator/strategies.py (2)
502-515: Duplicate_sanitize_labelmethod.This method is identical to
FixedTrialsStrategy._sanitize_label(lines 240-253). Consider extracting to a module-level utility function or moving to theExecutionStrategybase class to avoid duplication.♻️ Suggested refactor to base class
class ExecutionStrategy(ABC): """Base class for execution strategies. Strategies decide: 1. What config to run next (based on results so far) 2. Whether to continue or stop 3. How to label runs for artifact organization 4. Where to store artifacts (path structure) 5. Cooldown duration between runs """ + def _sanitize_label(self, label: str) -> str: + """Sanitize label to prevent path traversal attacks. + + Args: + label: Raw label string + + Returns: + Sanitized label safe for filesystem paths + """ + # Remove any path separators and parent directory references + sanitized = re.sub(r"[/\\]|\.\.", "", label) + # Remove any other potentially dangerous characters + sanitized = re.sub(r'[<>:"|?*]', "", sanitized) + return sanitized + def validate_config(self, config: UserConfig) -> None: # noqa: B027Then remove the duplicate
_sanitize_labelmethods from bothFixedTrialsStrategyandParameterSweepStrategy.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiperf/orchestrator/strategies.py` around lines 502 - 515, The _sanitize_label implementation is duplicated in FixedTrialsStrategy and ParameterSweepStrategy; extract it into a single reusable location (either a module-level utility function like sanitize_label(...) or as a protected method on the ExecutionStrategy base class) and update both classes to call that centralized function/method; then remove the duplicate _sanitize_label methods from FixedTrialsStrategy and ParameterSweepStrategy so only the shared implementation remains referenced by both.
458-467: Instance state mutation on first call.The
base_seedis lazily initialized on the first call toget_next_config. This is fine for single-threaded orchestration, but worth noting that this method is not thread-safe if the same strategy instance were used concurrently.Consider documenting this assumption in the class docstring, or initializing
base_seedin__init__if the seed value is available at construction time.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiperf/orchestrator/strategies.py` around lines 458 - 467, The lazy initialization of self.base_seed inside get_next_config is not thread-safe; either move the initialization logic into the class constructor (__init__) so base_seed is set from config.input.random_seed or DEFAULT_SEED (honoring auto_set_seed) at object creation, or explicitly document the concurrency assumption in the class docstring of the strategy explaining that get_next_config mutates instance state (base_seed) and is not safe for concurrent calls; update references to DEFAULT_SEED, auto_set_seed, and config.input.random_seed as needed to ensure the same behavior when moving the logic to __init__.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/api/sweep-aggregates.md`:
- Around line 22-28: In docs/api/sweep-aggregates.md there are multiple
unlabeled fenced code blocks (e.g., the artifact tree starting with "artifacts/"
and the execution pattern blocks); update each triple-backtick fence to include
a language identifier (use "text" for artifact/command listings) so they become
```text; search for unlabeled ``` blocks in the file (including the other ranges
noted in the review) and replace them with ```text to satisfy markdownlint MD040
and keep formatting consistent.
- Around line 34-63: Update the documented JSON shape to match the exporter by
moving aggregation_type out of "metadata" to top-level and adding the exporter
fields aggregation_type, num_profile_runs, num_successful_runs, and failed_runs
to the Top-Level Structure and examples; keep the existing "metadata" object
(with parameter_name, parameter_values, num_values, num_trials_per_value,
sweep_mode, confidence_level, benchmark_config) but remove aggregation_type from
it so the example blocks (the Top-Level Structure and Metadata Section) reflect
the actual exporter output.
In `@docs/tutorials/parameter-sweeping.md`:
- Line 89: Update the tutorial snippet to use the valid CLI endpoint type:
replace the incorrect flag value "--endpoint-type openai_chat" with
"--endpoint-type chat" in the parameter-sweeping example (look for the CLI
invocation containing the "--endpoint-type" argument).
In `@src/aiperf/cli_runner.py`:
- Around line 578-629: The code currently calls SweepAggregation.compute with
sweep_values that may include values missing per-value aggregates
(per_value_stats), causing KeyError; filter sweep_values to only those present
in per_value_stats before calling SweepAggregation.compute and when computing
num_trials. Specifically, build a filtered list (e.g., sweep_values_filtered =
[v for v in sweep_values if v in per_value_stats]), pass that to
SweepAggregation.compute(per_value_stats, sweep_values_filtered,
sweep_param_name), and use results_by_value[sweep_values_filtered[0]] (or handle
empty case) to set metadata["num_trials_per_value"] so you don't index into a
missing value.
In `@src/aiperf/common/config/loadgen_config.py`:
- Around line 49-52: The validator currently returns a list as-is (if
isinstance(v, list): return v) and later performs numeric comparisons that
assume integers; modify this to validate list element types before numeric
checks by iterating v and ensuring every element is an int (raise ValueError if
any element is not an int), and likewise for the single-value branch add an
explicit isinstance(v, int) check before doing comparisons like v < 1 (raise
ValueError on wrong type); apply the same change to the other validator block
noted around the 666-678 region so both paths enforce integer-only lists and
clear ValueError messages.
In `@src/aiperf/exporters/aggregate/aggregate_sweep_csv_exporter.py`:
- Around line 59-80: The code currently uses sorted(per_value_metrics.keys(),
key=int) which assumes integer sweep values and will fail for floats; update the
CSV row ordering in the block that builds rows (where per_value_metrics and
metric_names are used and writer.writerow is called) to use the canonical order
from the sweep metadata (parameter_values) if available, otherwise perform a
numeric sort that parses keys to float (not int) with a safe fallback to string
ordering for non-numeric keys; ensure you replace both places where sorted(...,
key=int) is used (first_value selection and the loop over
per_value_metrics.keys()) so ordering works for float request-rate sweeps and
mixed/non-numeric values.
In `@src/aiperf/orchestrator/orchestrator.py`:
- Around line 252-263: The code assumes the sweep parameter is named
"concurrency" which breaks other sweep types; make sweep handling
parameter-agnostic by detecting the actual sweep key from result metadata (e.g.,
inspect r.metadata keys or a 'parameter_name' field) instead of hardcoding
"concurrency", then use that detected key when building composed-run metadata
and when calling _collect_failed_sweep_values; update the composition path that
populates sweep metadata (the independent-mode composition block and the other
occurrences noted around the _collect_failed_sweep_values usage) so both
composed runs and sweep-only logging use the same detected sweep parameter name
rather than "concurrency".
In `@tests/unit/exporters/aggregate/test_sweep_exporters.py`:
- Around line 245-255: The test constructs an AggregateResult with the full
sweep_data in the metrics field which hides exporter regressions; update the
AggregateResult instantiation (the AggregateResult class and its metrics
parameter) to pass only the per_value_metrics subset (use per_value_metrics
instead of sweep_data) so the test mirrors the production shape expected by the
exporters (locate the AggregateResult(...) where metrics=sweep_data and replace
with metrics=per_value_metrics).
In `@tests/unit/orchestrator/test_parameter_sweep_properties.py`:
- Around line 54-59: The test constructs a RunResult with a hard-coded
artifacts_path using Path(f"/tmp/{label}") which triggers Ruff S108 and fails on
non-Unix systems; update the test to use a portable temp path by replacing
Path(f"/tmp/{label}") with a path built from tempfile.gettempdir() (e.g.
Path(tempfile.gettempdir()) / label) or use the pytest tmp_path fixture to
construct artifacts_path, keeping the RunResult instantiation (RunResult(...,
artifacts_path=...)) otherwise unchanged.
In `@tests/unit/test_cli_runner_aggregation.py`:
- Around line 21-26: Replace the Unicode multiplication sign '×' with the ASCII
'x' in the docstrings that contain "Structure: 2 trials × 3 concurrency values =
6 runs" (and the similar occurrence around the later docstring block) so the
text reads "2 trials x 3 concurrency values = 6 runs" to satisfy Ruff RUF002;
update each docstring occurrence accordingly.
---
Nitpick comments:
In `@docs/tutorials/parameter-sweeping.md`:
- Line 43: Several fenced code blocks that show output examples are missing
language specifiers; locate the bare triple-backtick blocks (``` ) used for
output examples in parameter-sweeping.md and change them to include a language
specifier like ```text (or ```output) so the blocks are explicitly marked as
plain text; update each instance where a fenced block is used for output (the
bare ``` occurrences) to use ```text for consistency.
In `@src/aiperf/common/config/user_config.py`:
- Around line 337-365: The validate_sweep_incompatibilities method currently
detects sweeps by checking isinstance(self.loadgen.concurrency, list); replace
that check with a call to get_sweep_parameter() so sweep detection is
parameter‑agnostic as more sweepable parameters are added. Update
validate_sweep_incompatibilities to call get_sweep_parameter() (or store its
result) and use its truthiness to drive the existing incompatibility checks
(including the fixed-schedule check and the
_should_use_fixed_schedule_for_mooncake_trace() branch) so all sweep-related
logic relies on the unified sweep detector.
In `@src/aiperf/orchestrator/strategies.py`:
- Around line 502-515: The _sanitize_label implementation is duplicated in
FixedTrialsStrategy and ParameterSweepStrategy; extract it into a single
reusable location (either a module-level utility function like
sanitize_label(...) or as a protected method on the ExecutionStrategy base
class) and update both classes to call that centralized function/method; then
remove the duplicate _sanitize_label methods from FixedTrialsStrategy and
ParameterSweepStrategy so only the shared implementation remains referenced by
both.
- Around line 458-467: The lazy initialization of self.base_seed inside
get_next_config is not thread-safe; either move the initialization logic into
the class constructor (__init__) so base_seed is set from
config.input.random_seed or DEFAULT_SEED (honoring auto_set_seed) at object
creation, or explicitly document the concurrency assumption in the class
docstring of the strategy explaining that get_next_config mutates instance state
(base_seed) and is not safe for concurrent calls; update references to
DEFAULT_SEED, auto_set_seed, and config.input.random_seed as needed to ensure
the same behavior when moving the logic to __init__.
In `@tests/unit/orchestrator/aggregation/test_sweep.py`:
- Around line 124-139: Tests import identify_pareto_optimal, analyze_trends, and
SweepAggregation inside individual test functions which is verbose; consolidate
by adding a single top-level import statement for these symbols
(identify_pareto_optimal, analyze_trends, SweepAggregation) alongside the
existing module imports at the top of the test file and remove the per-test
inline imports so each test simply uses the functions/classes directly.
| aggregate = AggregateResult( | ||
| aggregation_type="sweep", | ||
| num_runs=10, | ||
| num_successful_runs=5, | ||
| failed_runs=[ | ||
| {"run_index": 5, "error": "Connection timeout"}, | ||
| {"run_index": 6, "error": "Connection timeout"}, | ||
| ], | ||
| metadata={}, | ||
| metrics=sweep_data, | ||
| ) |
There was a problem hiding this comment.
Use per_value_metrics for AggregateResult.metrics in this test.
Passing the full sweep_data dict can mask exporter regressions because the production path expects only per_value_metrics.
✅ Proposed fix
- metrics=sweep_data,
+ metrics=sweep_data["per_value_metrics"],🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/exporters/aggregate/test_sweep_exporters.py` around lines 245 -
255, The test constructs an AggregateResult with the full sweep_data in the
metrics field which hides exporter regressions; update the AggregateResult
instantiation (the AggregateResult class and its metrics parameter) to pass only
the per_value_metrics subset (use per_value_metrics instead of sweep_data) so
the test mirrors the production shape expected by the exporters (locate the
AggregateResult(...) where metrics=sweep_data and replace with
metrics=per_value_metrics).
There was a problem hiding this comment.
🧹 Nitpick comments (5)
tests/unit/common/config/test_loadgen_config_validators.py (4)
389-399: Misleading test method name.Same issue:
test_parameter_sweep_same_seed_with_single_concurrency_succeedsactually expects aValidationError.✏️ Suggested rename
- def test_parameter_sweep_same_seed_with_single_concurrency_succeeds(self): - """Test that parameter_sweep_same_seed with single concurrency now raises error (requires list).""" + def test_parameter_sweep_same_seed_with_single_concurrency_raises_error(self): + """Test that parameter_sweep_same_seed with single concurrency raises error (requires list)."""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/common/config/test_loadgen_config_validators.py` around lines 389 - 399, Rename the test function test_parameter_sweep_same_seed_with_single_concurrency_succeeds to a name that reflects it expects a ValidationError (e.g., test_parameter_sweep_same_seed_with_single_concurrency_raises), and update any references if used elsewhere; ensure the body still asserts that constructing LoadGeneratorConfig(concurrency=10, parameter_sweep_same_seed=True) raises ValidationError and that the error message contains the expected text.
326-336: Misleading test method name.Similar to the previous case:
test_parameter_sweep_cooldown_with_single_concurrency_succeedsactually expects aValidationError.✏️ Suggested rename
- def test_parameter_sweep_cooldown_with_single_concurrency_succeeds(self): - """Test that parameter_sweep_cooldown_seconds with single concurrency now raises error (requires list).""" + def test_parameter_sweep_cooldown_with_single_concurrency_raises_error(self): + """Test that parameter_sweep_cooldown_seconds with single concurrency raises error (requires list)."""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/common/config/test_loadgen_config_validators.py` around lines 326 - 336, Rename the misnamed test method test_parameter_sweep_cooldown_with_single_concurrency_succeeds to reflect that it expects a ValidationError (e.g., test_parameter_sweep_cooldown_with_single_concurrency_raises or _fails) so the name matches the assertion; update the test function name in the test file and any references to it (the function is the test method in tests/unit/common/config/test_loadgen_config_validators.py that instantiates LoadGeneratorConfig(concurrency=10, parameter_sweep_cooldown_seconds=3.0) and asserts pytest.raises(ValidationError)).
427-445: Test contains unreachable code and incomplete implementation.The code after the
with pytest.raises(ValidationError):block (lines 438-441) is unreachable because the exception will exit the context. The subsequent lines 444-445 then test a different scenario without clear separation. The comment "For now, test that default value works without error" suggests this is intentionally incomplete.Consider either completing the test or splitting it into two separate test methods for clarity.
✏️ Suggested fix: split into two clear tests
def test_parameter_sweep_mode_with_single_concurrency_raises_error(self): - """Test that explicitly setting parameter_sweep_mode with single concurrency raises ValueError.""" - # Note: We need to simulate explicit setting via model_fields_set - # In actual CLI usage, cyclopts would set this field - config_dict = { - "concurrency": 10, - "parameter_sweep_mode": "repeated", - } - # Manually mark the field as set + """Test that explicitly setting parameter_sweep_mode with single concurrency raises ValueError.""" with pytest.raises(ValidationError): - config = LoadGeneratorConfig(**config_dict) - # Simulate explicit setting - config.model_fields_set.add("parameter_sweep_mode") - # Trigger validation - config.model_validate(config.model_dump()) - - # For now, test that default value works without error + LoadGeneratorConfig(concurrency=10, parameter_sweep_mode="repeated") + + def test_parameter_sweep_mode_default_with_single_concurrency_succeeds(self): + """Test that default parameter_sweep_mode with single concurrency succeeds.""" config = LoadGeneratorConfig(concurrency=10) assert config.parameter_sweep_mode == "repeated" # default value is fine🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/common/config/test_loadgen_config_validators.py` around lines 427 - 445, The test test_parameter_sweep_mode_with_single_concurrency_raises_error mixes two scenarios and contains unreachable code after the with pytest.raises(ValidationError) block; split it into two focused tests: one (e.g., test_parameter_sweep_mode_explicit_with_single_concurrency_raises_error) that constructs LoadGeneratorConfig(concurrency=10, parameter_sweep_mode="repeated"), adds "parameter_sweep_mode" to config.model_fields_set, and asserts a ValidationError is raised when calling config.model_validate(config.model_dump()), and a second test (e.g., test_parameter_sweep_mode_default_with_single_concurrency_no_error) that constructs LoadGeneratorConfig(concurrency=10) and asserts config.parameter_sweep_mode == "repeated"; ensure no code remains after the pytest.raises context in the first test so there is no unreachable code.
264-273: Misleading test method name.The method is named
test_parameter_sweep_mode_with_single_concurrency_succeedsbut it actually tests that the operation raises aValidationError. This can confuse future maintainers.✏️ Suggested rename
- def test_parameter_sweep_mode_with_single_concurrency_succeeds(self): - """Test that parameter_sweep_mode with single concurrency now raises error (requires list).""" + def test_parameter_sweep_mode_with_single_concurrency_raises_error(self): + """Test that parameter_sweep_mode with single concurrency raises error (requires list)."""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/common/config/test_loadgen_config_validators.py` around lines 264 - 273, Rename the misleading test method test_parameter_sweep_mode_with_single_concurrency_succeeds to reflect that it expects a ValidationError (for example test_parameter_sweep_mode_with_single_concurrency_raises_validation_error), and update the docstring to indicate it asserts a failure; keep the body intact (the with pytest.raises(ValidationError) block and the assertion on error_msg) and ensure any references to the old name (e.g., in test discovery or comments) are updated accordingly; the relevant symbols are the test method name and the LoadGeneratorConfig(concurrency=10, parameter_sweep_mode="independent") call/assertion.src/aiperf/common/config/loadgen_config.py (1)
652-690: Validation assumes list elements are already integers.The validator performs
v < 1comparisons on list elements without first confirming they are integers. This pairs with the issue inparse_concurrency_list- if non-integer values slip through, this comparison could behave unexpectedly (floats would pass, non-numeric types would raiseTypeError).This is mitigated if the type validation suggested for
parse_concurrency_listis implemented.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiperf/common/config/loadgen_config.py` around lines 652 - 690, The validator validate_concurrency_list currently assumes list elements are integers and does comparisons like v < 1 which can raise TypeError for non-numeric types; update validate_concurrency_list (and ensure parity with parse_concurrency_list) to first verify each item in self.concurrency is an int (or safely coercible to int if you want to accept numeric strings/floats), collect positions of any non-integer items and raise a clear ValueError pointing to those positions, or coerce valid numeric values to int before running the existing < 1 checks; reference self.concurrency, validate_concurrency_list, and parse_concurrency_list when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/aiperf/common/config/loadgen_config.py`:
- Around line 49-51: The current branch in loadgen_config.py returns a list
early without validating element types; update the logic in the function
validate_concurrency_list (or the function that handles variable v) to iterate
the list when isinstance(v, list) and validate each element is an integer >= 1
(or coerce numeric strings/floats to int if intended) and otherwise raise a
clear ValueError; ensure the subsequent numeric comparison (v < 1) is applied
per element after validation/coercion so non-integer types (e.g., floats,
strings) are caught or converted before returning the list.
---
Nitpick comments:
In `@src/aiperf/common/config/loadgen_config.py`:
- Around line 652-690: The validator validate_concurrency_list currently assumes
list elements are integers and does comparisons like v < 1 which can raise
TypeError for non-numeric types; update validate_concurrency_list (and ensure
parity with parse_concurrency_list) to first verify each item in
self.concurrency is an int (or safely coercible to int if you want to accept
numeric strings/floats), collect positions of any non-integer items and raise a
clear ValueError pointing to those positions, or coerce valid numeric values to
int before running the existing < 1 checks; reference self.concurrency,
validate_concurrency_list, and parse_concurrency_list when making the change.
In `@tests/unit/common/config/test_loadgen_config_validators.py`:
- Around line 389-399: Rename the test function
test_parameter_sweep_same_seed_with_single_concurrency_succeeds to a name that
reflects it expects a ValidationError (e.g.,
test_parameter_sweep_same_seed_with_single_concurrency_raises), and update any
references if used elsewhere; ensure the body still asserts that constructing
LoadGeneratorConfig(concurrency=10, parameter_sweep_same_seed=True) raises
ValidationError and that the error message contains the expected text.
- Around line 326-336: Rename the misnamed test method
test_parameter_sweep_cooldown_with_single_concurrency_succeeds to reflect that
it expects a ValidationError (e.g.,
test_parameter_sweep_cooldown_with_single_concurrency_raises or _fails) so the
name matches the assertion; update the test function name in the test file and
any references to it (the function is the test method in
tests/unit/common/config/test_loadgen_config_validators.py that instantiates
LoadGeneratorConfig(concurrency=10, parameter_sweep_cooldown_seconds=3.0) and
asserts pytest.raises(ValidationError)).
- Around line 427-445: The test
test_parameter_sweep_mode_with_single_concurrency_raises_error mixes two
scenarios and contains unreachable code after the with
pytest.raises(ValidationError) block; split it into two focused tests: one
(e.g., test_parameter_sweep_mode_explicit_with_single_concurrency_raises_error)
that constructs LoadGeneratorConfig(concurrency=10,
parameter_sweep_mode="repeated"), adds "parameter_sweep_mode" to
config.model_fields_set, and asserts a ValidationError is raised when calling
config.model_validate(config.model_dump()), and a second test (e.g.,
test_parameter_sweep_mode_default_with_single_concurrency_no_error) that
constructs LoadGeneratorConfig(concurrency=10) and asserts
config.parameter_sweep_mode == "repeated"; ensure no code remains after the
pytest.raises context in the first test so there is no unreachable code.
- Around line 264-273: Rename the misleading test method
test_parameter_sweep_mode_with_single_concurrency_succeeds to reflect that it
expects a ValidationError (for example
test_parameter_sweep_mode_with_single_concurrency_raises_validation_error), and
update the docstring to indicate it asserts a failure; keep the body intact (the
with pytest.raises(ValidationError) block and the assertion on error_msg) and
ensure any references to the old name (e.g., in test discovery or comments) are
updated accordingly; the relevant symbols are the test method name and the
LoadGeneratorConfig(concurrency=10, parameter_sweep_mode="independent")
call/assertion.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
src/aiperf/exporters/aggregate/aggregate_sweep_csv_exporter.py (1)
76-82: Consider unioning metric names across all values.Using only the first value’s metrics can omit metrics present in other values. A simple union avoids silent data loss if per‑value metric sets drift.
♻️ Optional refactor
- if value_keys: - first_value = value_keys[0] - metric_names = sorted(per_value_metrics[first_value].keys()) - else: - metric_names = [] + if value_keys: + metric_names = sorted( + { + metric_name + for value in value_keys + for metric_name in per_value_metrics.get(value, {}).keys() + } + ) + else: + metric_names = []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiperf/exporters/aggregate/aggregate_sweep_csv_exporter.py` around lines 76 - 82, The current logic only uses the first entry in value_keys to build metric_names, which can miss metrics present in other entries; update the code that computes metric_names (currently referencing value_keys, first_value and per_value_metrics) to iterate all keys in per_value_metrics, take the union of all metric keys across every per_value_metrics[value] entry, then sort that union into metric_names (fall back to empty list if none). Ensure any downstream logic that expects metric_names remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/troubleshooting/parameter-sweeping-errors.md`:
- Around line 9-13: Update all fenced code blocks in the document that show
plain error/notice messages to include a language identifier (e.g., use ```text
instead of ```). Specifically modify the block containing "Invalid concurrency
value: 'abc'. Must be a positive integer (>= 1)." and the other similar message
blocks at the ranges noted (30-35, 52-58, 79-85, 110-115, 141-147, 169-175,
195-200, 220-225, 235-240) so each opening fence includes a language token like
text to satisfy markdownlint MD040.
In `@docs/tutorials/parameter-sweeping.md`:
- Around line 43-45: The markdown contains several non‑code‑fenced ASCII/output
blocks like the snippet "Parameter sweep mode: UI automatically set to 'simple'
(use '--ui none' to disable UI output)"; update each triple‑backtick fence to
include a language identifier (preferably text) so they read ```text ... ```,
and apply the same change to all similar output blocks in this document (the
other blocks referenced in the review). Ensure every non-code ASCII/output block
uses ```text to satisfy markdownlint MD040.
In `@src/aiperf/exporters/aggregate/aggregate_sweep_csv_exporter.py`:
- Around line 59-75: The current logic builds value_keys from
self._result.metadata["parameter_values"] but can yield an empty list if
stringified metadata entries don't match per_value_metrics keys; update the
block around parameter_values/per_value_metrics to normalize and compare keys
robustly (e.g., try matching both str and numeric forms) and if the filtered
value_keys is empty fall back to the numeric_sort_key-based
sorted(per_value_metrics.keys()) approach; ensure you keep numeric_sort_key and
the existing fallback behavior while only using the metadata canonical order
when a non-empty normalized match list is produced.
In `@src/aiperf/orchestrator/strategies.py`:
- Around line 520-527: Docstring examples for the composed sweep paths still use
"run_0001" but FixedTrialsStrategy now emits "trial_0001"; update the examples
in the sweep-related docstring in strategies.py to replace "run_0001" with
"trial_0001" for both repeated and independent mode examples so they reflect the
new FixedTrialsStrategy naming (search for the block containing "Composed with
FixedTrialsStrategy" and adjust the path examples accordingly).
---
Nitpick comments:
In `@src/aiperf/exporters/aggregate/aggregate_sweep_csv_exporter.py`:
- Around line 76-82: The current logic only uses the first entry in value_keys
to build metric_names, which can miss metrics present in other entries; update
the code that computes metric_names (currently referencing value_keys,
first_value and per_value_metrics) to iterate all keys in per_value_metrics,
take the union of all metric keys across every per_value_metrics[value] entry,
then sort that union into metric_names (fall back to empty list if none). Ensure
any downstream logic that expects metric_names remains unchanged.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/aiperf/orchestrator/strategies.py (2)
489-501: Potential IndexError ifrun_indexexceedsparameter_valueslength.
get_run_labelis a public method that could be called with anyrun_index. If called with an index >=len(self.parameter_values), line 498 will raiseIndexError. Consider adding bounds checking or documenting the precondition.💡 Optional defensive fix
def get_run_label(self, run_index: int) -> str: """Generate label: concurrency_10, concurrency_20, etc. Args: run_index: Zero-based index of run Returns: Sanitized label safe for filesystem paths + + Raises: + IndexError: If run_index >= len(parameter_values) """ + if run_index >= len(self.parameter_values): + raise IndexError( + f"run_index {run_index} out of bounds for {len(self.parameter_values)} parameter values" + ) value = self.parameter_values[run_index] label = f"{self.parameter_name}_{value}" # Sanitize label to prevent path traversal return _sanitize_label(label)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiperf/orchestrator/strategies.py` around lines 489 - 501, get_run_label currently indexes self.parameter_values without validation which can raise IndexError; add explicit bounds checking at the start of get_run_label to ensure 0 <= run_index < len(self.parameter_values) and raise a clear exception (e.g., ValueError or IndexError with a descriptive message) if out of range, or alternatively clamp/handle invalid indices as desired; reference the method get_run_label, the attribute self.parameter_values and self.parameter_name, and still call _sanitize_label(label) for the returned label.
22-35: Consider handling edge case where sanitization produces empty string.The sanitization logic is sound for preventing path traversal. However, if a malformed label somehow results in an empty string after sanitization, it could cause issues downstream (e.g., creating a directory named "").
💡 Optional defensive fix
def _sanitize_label(label: str) -> str: """Sanitize label to prevent path traversal attacks. Args: label: Raw label string Returns: Sanitized label safe for filesystem paths + + Raises: + ValueError: If sanitization produces empty string """ # Remove any path separators and parent directory references sanitized = re.sub(r"[/\\]|\.\.", "", label) # Remove any other potentially dangerous characters sanitized = re.sub(r'[<>:"|?*]', "", sanitized) + if not sanitized: + raise ValueError(f"Label '{label}' sanitizes to empty string") return sanitized🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiperf/orchestrator/strategies.py` around lines 22 - 35, The _sanitize_label function can return an empty string for some inputs; update _sanitize_label to detect when sanitized == "" and handle it defensively — either raise a clear ValueError (including the original label in the message) or return a safe fallback like "unnamed_label"; ensure callers expect/handle this by documenting the behavior in the function docstring. Keep the sanitization steps (re.sub calls) intact, then add the empty-check and raise/return behavior immediately after them in the _sanitize_label function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/tutorials/parameter-sweeping.md`:
- Around line 74-77: Several fenced code blocks in
docs/tutorials/parameter-sweeping.md are missing a language identifier; update
each output/structure block (including the one shown and the blocks at lines
101-119, 146-152, 164-168, 188-223, 389-403) to use the text identifier by
changing the opening fence from ``` to ```text so the blocks render as plain
text; scan the file for other similar output blocks and apply the same change
consistently.
In `@src/aiperf/exporters/aggregate/aggregate_sweep_csv_exporter.py`:
- Around line 62-68: The code currently always appends str(v) to value_keys
causing KeyError when per_value_metrics uses numeric keys; change the logic in
the loop over parameter_values to append the actual matching key (either v if v
in per_value_metrics or str(v) if str(v) in per_value_metrics) so later lookups
use the same type as stored in per_value_metrics, and when writing CSV convert
that key to a string for output (i.e., preserve the matched_key in value_keys
and use str(matched_key) only for headers/printing); apply the same fix to the
second block referenced around the 89-115 region so all lookups use the
preserved actual key.
---
Nitpick comments:
In `@src/aiperf/orchestrator/strategies.py`:
- Around line 489-501: get_run_label currently indexes self.parameter_values
without validation which can raise IndexError; add explicit bounds checking at
the start of get_run_label to ensure 0 <= run_index < len(self.parameter_values)
and raise a clear exception (e.g., ValueError or IndexError with a descriptive
message) if out of range, or alternatively clamp/handle invalid indices as
desired; reference the method get_run_label, the attribute self.parameter_values
and self.parameter_name, and still call _sanitize_label(label) for the returned
label.
- Around line 22-35: The _sanitize_label function can return an empty string for
some inputs; update _sanitize_label to detect when sanitized == "" and handle it
defensively — either raise a clear ValueError (including the original label in
the message) or return a safe fallback like "unnamed_label"; ensure callers
expect/handle this by documenting the behavior in the function docstring. Keep
the sanitization steps (re.sub calls) intact, then add the empty-check and
raise/return behavior immediately after them in the _sanitize_label function.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (4)
docs/tutorials/parameter-sweeping.md (2)
74-77:⚠️ Potential issue | 🟡 MinorAdd language identifiers to remaining fenced blocks.
Several output blocks still use bare
fences (e.g., Lines 74‑77). markdownlint MD040 will continue to warn. Please change them totext (and apply consistently to the other unlabeled output blocks in this file).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/tutorials/parameter-sweeping.md` around lines 74 - 77, The other fenced output blocks are unlabeled (e.g., the block containing "ValueError: Dashboard UI is not supported with parameter sweeps..."); update every bare ``` output fence in this document to use the text language identifier by replacing ``` with ```text so markdownlint MD040 warnings are resolved and all unlabeled output blocks (including that error block) are consistently labeled.
236-334:⚠️ Potential issue | 🟡 MinorAlign sweep aggregate example with current per‑combination schema.
The example still documents the older
per_value_metricsmap and scalarpareto_optimalvalues. Current sweep outputs use aper_combination_metricslist, andbest_configurations/pareto_optimalcarry parameter dictionaries. This will mislead users when they parse real output.✅ Suggested doc alignment (excerpt)
{ "metadata": { - "parameter_name": "concurrency", - "parameter_values": [10, 20, 30, 40], - "num_values": 4, + "sweep_parameters": [{ "name": "concurrency", "values": [10, 20, 30, 40] }], + "num_combinations": 4, "num_trials_per_value": 5, "sweep_mode": "repeated", "confidence_level": 0.95 }, - "per_value_metrics": { - "10": { ... } - }, + "per_combination_metrics": [ + { "parameters": { "concurrency": 10 }, "metrics": { ... } } + ], "best_configurations": { "best_throughput": { - "value": 40, + "parameters": { "concurrency": 40 }, "metric": 255.1, "unit": "requests/sec" } }, - "pareto_optimal": [10, 30, 40] + "pareto_optimal": [{ "concurrency": 10 }, { "concurrency": 30 }, { "concurrency": 40 }] }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/tutorials/parameter-sweeping.md` around lines 236 - 334, The example JSON uses the old per_value_metrics map and scalar pareto_optimal/best_configuration values; update the example to use the current schema: replace "per_value_metrics" with a "per_combination_metrics" array where each element includes a "parameters" object (e.g., {"concurrency": 10}) plus the same metric objects (request_throughput_avg, ttft_p99_ms), change "pareto_optimal" from [10,30,40] to a list of parameter dictionaries (e.g., [{"concurrency":10}, {"concurrency":30}, {"concurrency":40}]), and change entries under "best_configurations" (best_throughput, best_latency_p99) so their "value" fields are parameter dictionaries instead of scalars; ensure metric fields (mean, std, ci_low/ci_high) remain intact and example units stay the same.tests/unit/orchestrator/test_parameter_sweep_properties.py (1)
9-58:⚠️ Potential issue | 🟡 MinorAvoid hard-coded
/tmpin tests (portable temp dir).This is non-portable and still trips Ruff S108. Prefer
tempfile.gettempdir()ortmp_path.🛠️ Suggested fix
+import tempfile @@ - artifacts_path=Path(f"/tmp/{label}"), + artifacts_path=Path(tempfile.gettempdir()) / label,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/orchestrator/test_parameter_sweep_properties.py` around lines 9 - 58, The test helper make_run_result currently hardcodes artifacts_path to Path(f"/tmp/{label}"), which is non-portable and triggers Ruff S108; update make_run_result to accept an optional artifacts_path parameter (or a tmp_path fixture) and use that when provided, otherwise derive a portable temp directory via tempfile.gettempdir() (or Path(tmp_path) when caller supplies tmp_path), then set RunResult(..., artifacts_path=that_path) so tests use a portable temp directory instead of '/tmp'.src/aiperf/orchestrator/orchestrator.py (1)
474-475:⚠️ Potential issue | 🟠 MajorSweep metadata is still hardcoded to
concurrency, breaking request_rate sweeps.Sweep-only and composed paths set/inspect only
concurrency, so request‑rate sweeps won’t be detected or aggregated correctly (and failure reporting won’t trigger). Use the actual sweep parameter name/value everywhere sweep metadata is written/read.🛠️ Suggested fix (parameter‑agnostic metadata)
@@ - result = self._execute_single_run(run_config, label, artifact_path) + result = self._execute_single_run(run_config, label, artifact_path) + if isinstance(strategy, ParameterSweepStrategy): + param_name = strategy.parameter_name + param_value = getattr(run_config.loadgen, param_name) + result.metadata.update( + { + "parameter_name": param_name, + "parameter_value": param_value, + param_name: param_value, + } + ) @@ - if any("concurrency" in r.metadata for r in results): + if any("parameter_name" in r.metadata for r in results): failed_values = self._collect_failed_sweep_values(results) @@ - result.metadata.update( - { - "trial_index": trial_index, - "value_index": value_index, - "concurrency": run_config.loadgen.concurrency, - "sweep_mode": "repeated", - } - ) + param_name = sweep_strategy.parameter_name + param_value = getattr(run_config.loadgen, param_name) + result.metadata.update( + { + "trial_index": trial_index, + "value_index": value_index, + "parameter_name": param_name, + "parameter_value": param_value, + param_name: param_value, + "sweep_mode": "repeated", + } + ) @@ - result.metadata.update( - { - "trial_index": trial_index, - "value_index": value_index, - "concurrency": run_config.loadgen.concurrency, - "sweep_mode": "independent", - } - ) + param_name = sweep_strategy.parameter_name + param_value = getattr(run_config.loadgen, param_name) + result.metadata.update( + { + "trial_index": trial_index, + "value_index": value_index, + "parameter_name": param_name, + "parameter_value": param_value, + param_name: param_value, + "sweep_mode": "independent", + } + ) @@ - if not result.success and "concurrency" in result.metadata: - param_value = result.metadata.get("concurrency") - param_name = "concurrency" # Currently only concurrency is supported + if not result.success and "parameter_name" in result.metadata: + param_name = result.metadata.get("parameter_name") + param_value = result.metadata.get("parameter_value")Also applies to: 497-503, 585-593, 711-718, 973-977
🧹 Nitpick comments (1)
src/aiperf/exporters/aggregate/aggregate_sweep_csv_exporter.py (1)
51-56: Remove duplicate assignment ofper_combination_metrics.This is harmless but adds noise in a critical formatter.
♻️ Suggested cleanup
- # Section 1: Per-combination metrics table - per_combination_metrics = self._result.metrics - # Section 1: Per-combination metrics table per_combination_metrics = self._result.metrics🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiperf/exporters/aggregate/aggregate_sweep_csv_exporter.py` around lines 51 - 56, There are two identical assignments to per_combination_metrics in aggregate_sweep_csv_exporter.py; remove the duplicate so only a single assignment remains (keep the line per_combination_metrics = self._result.metrics and delete the repeated one) to avoid redundant code in the method that builds the CSV export.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/api/sweep-aggregates.md`:
- Around line 66-90: The metadata example in docs incorrectly includes
best_configurations and pareto_optimal inside the metadata object; remove the
best_configurations and pareto_optimal entries from the metadata JSON example
and from the Fields table (leave them documented as top-level fields if needed),
so the example matches the exporter which filters these keys out of metadata;
update references to metadata, sweep_parameters, and num_combinations
accordingly (ensure only sweep_parameters and num_combinations remain in the
metadata example and Fields table).
In `@src/aiperf/exporters/aggregate/aggregate_sweep_json_exporter.py`:
- Around line 68-85: The _generate_content() currently filters out "trends" from
self._result.metadata and never writes it back, causing trend data to be
dropped; update the logic in aggregate_sweep_json_exporter (inside
_generate_content) to preserve trends by either not removing "trends" when
building metadata or by explicitly adding output["trends"] =
self._result.metadata.get("trends", {}) (reference symbols: _generate_content,
self._result.metadata, output, "trends") so the serialized JSON includes trend
analysis.
In `@tests/unit/exporters/aggregate/test_sweep_exporters.py`:
- Around line 108-112: The comment inside the AggregateResult construction uses
the Unicode multiplication sign (`×`) which raises RUF003; update the comment in
test_sweep_exporters.py (near the AggregateResult instance in
test_sweep_exporters.py where result is defined) to use an ASCII character such
as "x" or an expression like "3 * 5" (e.g., change "3 values × 5 trials" to "3
values x 5 trials" or "3 * 5 trials") so the comment contains only ASCII
characters.
---
Duplicate comments:
In `@docs/tutorials/parameter-sweeping.md`:
- Around line 74-77: The other fenced output blocks are unlabeled (e.g., the
block containing "ValueError: Dashboard UI is not supported with parameter
sweeps..."); update every bare ``` output fence in this document to use the text
language identifier by replacing ``` with ```text so markdownlint MD040 warnings
are resolved and all unlabeled output blocks (including that error block) are
consistently labeled.
- Around line 236-334: The example JSON uses the old per_value_metrics map and
scalar pareto_optimal/best_configuration values; update the example to use the
current schema: replace "per_value_metrics" with a "per_combination_metrics"
array where each element includes a "parameters" object (e.g., {"concurrency":
10}) plus the same metric objects (request_throughput_avg, ttft_p99_ms), change
"pareto_optimal" from [10,30,40] to a list of parameter dictionaries (e.g.,
[{"concurrency":10}, {"concurrency":30}, {"concurrency":40}]), and change
entries under "best_configurations" (best_throughput, best_latency_p99) so their
"value" fields are parameter dictionaries instead of scalars; ensure metric
fields (mean, std, ci_low/ci_high) remain intact and example units stay the
same.
In `@tests/unit/orchestrator/test_parameter_sweep_properties.py`:
- Around line 9-58: The test helper make_run_result currently hardcodes
artifacts_path to Path(f"/tmp/{label}"), which is non-portable and triggers Ruff
S108; update make_run_result to accept an optional artifacts_path parameter (or
a tmp_path fixture) and use that when provided, otherwise derive a portable temp
directory via tempfile.gettempdir() (or Path(tmp_path) when caller supplies
tmp_path), then set RunResult(..., artifacts_path=that_path) so tests use a
portable temp directory instead of '/tmp'.
---
Nitpick comments:
In `@src/aiperf/exporters/aggregate/aggregate_sweep_csv_exporter.py`:
- Around line 51-56: There are two identical assignments to
per_combination_metrics in aggregate_sweep_csv_exporter.py; remove the duplicate
so only a single assignment remains (keep the line per_combination_metrics =
self._result.metrics and delete the repeated one) to avoid redundant code in the
method that builds the CSV export.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
docs/api/sweep-aggregates.mddocs/tutorials/parameter-sweeping.mdsrc/aiperf/cli_runner.pysrc/aiperf/exporters/aggregate/aggregate_sweep_csv_exporter.pysrc/aiperf/exporters/aggregate/aggregate_sweep_json_exporter.pysrc/aiperf/orchestrator/aggregation/__init__.pysrc/aiperf/orchestrator/aggregation/sweep.pysrc/aiperf/orchestrator/orchestrator.pytests/integration/test_parameter_sweep.pytests/unit/exporters/aggregate/test_sweep_exporters.pytests/unit/orchestrator/aggregation/test_sweep.pytests/unit/orchestrator/test_parameter_sweep_properties.pytests/unit/plot/test_mode_detector.pytests/unit/test_cli_runner_aggregation.py
| # Extract metadata (excluding the sweep-specific sections) | ||
| metadata = {} | ||
| for key, value in self._result.metadata.items(): | ||
| if key not in ["best_configurations", "pareto_optimal", "trends"]: | ||
| metadata[key] = value | ||
|
|
||
| output["metadata"] = metadata | ||
|
|
||
| # Add per_combination_metrics (stored in result.metrics) | ||
| output["per_combination_metrics"] = self._result.metrics | ||
|
|
||
| # Add sweep-specific sections from metadata | ||
| output["best_configurations"] = self._result.metadata.get( | ||
| "best_configurations", {} | ||
| ) | ||
| output["pareto_optimal"] = self._result.metadata.get("pareto_optimal", []) | ||
|
|
||
| # Serialize to JSON with indentation |
There was a problem hiding this comment.
Preserve trends in JSON output instead of dropping it.
_generate_content() filters trends out of metadata but never writes it back to the payload, which silently discards trend analysis when present.
🔧 Suggested fix
output["best_configurations"] = self._result.metadata.get(
"best_configurations", {}
)
output["pareto_optimal"] = self._result.metadata.get("pareto_optimal", [])
+ output["trends"] = self._result.metadata.get("trends", {})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Extract metadata (excluding the sweep-specific sections) | |
| metadata = {} | |
| for key, value in self._result.metadata.items(): | |
| if key not in ["best_configurations", "pareto_optimal", "trends"]: | |
| metadata[key] = value | |
| output["metadata"] = metadata | |
| # Add per_combination_metrics (stored in result.metrics) | |
| output["per_combination_metrics"] = self._result.metrics | |
| # Add sweep-specific sections from metadata | |
| output["best_configurations"] = self._result.metadata.get( | |
| "best_configurations", {} | |
| ) | |
| output["pareto_optimal"] = self._result.metadata.get("pareto_optimal", []) | |
| # Serialize to JSON with indentation | |
| # Extract metadata (excluding the sweep-specific sections) | |
| metadata = {} | |
| for key, value in self._result.metadata.items(): | |
| if key not in ["best_configurations", "pareto_optimal", "trends"]: | |
| metadata[key] = value | |
| output["metadata"] = metadata | |
| # Add per_combination_metrics (stored in result.metrics) | |
| output["per_combination_metrics"] = self._result.metrics | |
| # Add sweep-specific sections from metadata | |
| output["best_configurations"] = self._result.metadata.get( | |
| "best_configurations", {} | |
| ) | |
| output["pareto_optimal"] = self._result.metadata.get("pareto_optimal", []) | |
| output["trends"] = self._result.metadata.get("trends", {}) | |
| # Serialize to JSON with indentation |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/aiperf/exporters/aggregate/aggregate_sweep_json_exporter.py` around lines
68 - 85, The _generate_content() currently filters out "trends" from
self._result.metadata and never writes it back, causing trend data to be
dropped; update the logic in aggregate_sweep_json_exporter (inside
_generate_content) to preserve trends by either not removing "trends" when
building metadata or by explicitly adding output["trends"] =
self._result.metadata.get("trends", {}) (reference symbols: _generate_content,
self._result.metadata, output, "trends") so the serialized JSON includes trend
analysis.
7b4fa22 to
518e4a8
Compare
Some inference servers (e.g. vLLM with ignore_eos=True) emit raw newline bytes inside JSON string values in SSE data fields, violating both the SSE and JSON specs. This splits a single data field into a truncated data packet and a garbage packet, causing orjson parse failures.
Detect incomplete JSON objects in SSE data fields (starts with '{' but doesn't end with '}') and treat the next non-data line as a continuation, escaping the embedded newline as \n to produce valid JSON.
…dynamo#704) Signed-off-by: Nicolas 'Pixel' Noble <nicolas@nobis-crew.org>
…ynamo#706) Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
… hack (ai-dynamo#709) Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
…ynamo#698) Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
…marking (ai-dynamo#714) Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Signed-off-by: Pavithra Vijayakrishnan <160681768+pvijayakrish@users.noreply.github.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
…-dynamo#723) Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Signed-off-by: Lucas Vogel <lvogel@nvidia.com>
…namo#717) Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
…rt (ai-dynamo#718) Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
…dynamo#721) Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
…mo#727) Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
…ector Replaced hardcoded __pycache__ check with proper _is_run_directory() validation in test_parent_and_child_paths_mixed test. This makes the test more robust by using the same validation logic as the production code, rather than relying on directory name patterns. All 33 tests in test_mode_detector.py verified passing. Signed-off-by: Loki Ravi <lokravi@amazon.com>
Added comprehensive unit tests for MultiRunOrchestrator aggregation and export functionality: - Test aggregation routing (_aggregate_and_export) - Test confidence-only export (_export_confidence_aggregate) - Test sweep export (_export_sweep_aggregates) - Test aggregate_results for confidence and sweep modes - Test _collect_failed_sweep_values with various scenarios Coverage improvement: - orchestrator.py: 49% → 64% (+15%) - 14 new tests, all passing Part of Phase 1 coverage improvement effort. Signed-off-by: Loki Ravi <lokravi@amazon.com>
Renamed test functions to follow test_<function>_<scenario>_<expected> convention: - test_aggregate_and_export_confidence_only_mode ��� test_aggregate_and_export_confidence_only_mode_calls_confidence_export - test_aggregate_and_export_sweep_mode → test_aggregate_and_export_sweep_mode_calls_sweep_export - test_aggregate_results_confidence_only_mode → test_aggregate_results_confidence_only_mode_returns_aggregate - test_aggregate_results_sweep_mode → test_aggregate_results_sweep_mode_returns_per_value_and_sweep_aggregates All tests passing: 6972 passed, 18 skipped Signed-off-by: Loki Ravi <lokravi@amazon.com>
…truncation Previously, passing [1.5, 2.5] would silently truncate to [1, 2] without any error. This could lead to unexpected behavior where users think they're testing concurrency 1.5 but actually testing concurrency 1. Now explicitly checks if a float value is not an integer (using is_integer()) and raises a clear ValueError before attempting int() conversion. Integer floats like 1.0, 2.0 still work correctly and are converted to ints. Fixes code review finding: float truncation bug in loadgen_config.py line 59 Signed-off-by: Loki Ravi <lokravi@amazon.com>
…t behavior The three tests in TestSweepParamsValidation were named '_raises_error' but actually test that default values work correctly. Updated docstrings to clarify that: 1. The validators DO work correctly (they check model_fields_set) 2. Testing explicit user setting is complex (requires simulating CLI behavior) 3. These tests verify default values don't cause issues The validators in loadgen_config.py correctly check model_fields_set and raise appropriate errors when users explicitly set sweep parameters with single concurrency values. The test names remain unchanged to preserve test history, but docstrings now accurately describe what's being tested. Addresses code review finding about unreachable test code - the suggested approach (model_construct + manual model_fields_set manipulation) doesn't work well because other validators interfere during model_validate(). Signed-off-by: Loki Ravi <lokravi@amazon.com>
…g tutorial Corrected the Pareto optimality analysis for concurrency values. Previously stated that concurrency 30 dominates concurrency 20, which is mathematically incorrect. For Pareto dominance, a configuration must be better or equal on ALL objectives. Concurrency 30 has: - Better throughput: 245.3 vs 175.8 req/s ✓ - WORSE latency: 180.5ms vs 145.2ms ✗ Since latency is worse, concurrency 30 does NOT dominate concurrency 20. Therefore, concurrency 20 IS Pareto optimal. Updated: - Explanation text to correctly state concurrency 20 is Pareto optimal - JSON example to include concurrency 20 in pareto_optimal array - Added description of why concurrency 20 is Pareto optimal Fixes code review finding: incorrect Pareto dominance in parameter-sweeping.md Signed-off-by: Loki Ravi <lokravi@amazon.com>
Updated sweep-aggregates.md to document both directory structures: 1. Independent mode (sweep-only): sweep_aggregate/ at top level 2. Repeated mode (sweep + trials): aggregate/sweep_aggregate/ Previously only showed independent mode structure, which was confusing for users running sweeps with --num-profile-runs > 1. Now clearly shows: - When each structure is used - Example directory trees for both modes - Location of per-value confidence aggregates in repeated mode Fixes code review finding: directory structure clarity in sweep-aggregates.md Signed-off-by: Loki Ravi <lokravi@amazon.com>
Removed two inline comments that simply restated what the code does: - '# Execute runs' before self.execute() - '# Aggregate and export results' before self._aggregate_and_export() These comments don't add value since the method names are self-explanatory. The method docstring already documents the overall flow. Addresses code review nitpick: redundant comments in orchestrator.py lines 79-83 Signed-off-by: Loki Ravi <lokravi@amazon.com>
- Add comprehensive test coverage for sweep aggregation computation - Test AggregateResult creation and metadata handling - Test parameter detection and metric conversion - Test failed run counting and error handling - Test best configurations and Pareto optimal storage - Test trials per value calculation - All 9 tests passing Signed-off-by: Loki Ravi <lokravi@amazon.com>
- orchestrator.py coverage improved from 62% to 63% (+1%) - Overall improvement from baseline: 49% → 63% (+14%) - Phase 1 item 4 completed: test_orchestrator_sweep_aggregation.py - All 9 sweep aggregation tests passing Signed-off-by: Loki Ravi <lokravi@amazon.com>
- test_orchestrator_aggregation.py already has comprehensive tests - All 14 tests passing for export methods - Covers _aggregate_and_export, _export_confidence_aggregate, _export_sweep_aggregates - Per-value aggregate export tested within sweep export tests Signed-off-by: Loki Ravi <lokravi@amazon.com>
- Removed 50+ lines of commented-out test code - Replaced with concise NOTE comment explaining removal - analyze_trends() function was removed from design - All 27 tests passing, 10 skipped Signed-off-by: Loki Ravi <lokravi@amazon.com>
- Remove Property 8 (Repeated Mode Execution) test stubs - Remove Property 9 (Independent Mode Execution) test stubs - Remove Property 16 (Backward Compatibility) test stubs - Remove Property 17 (UI Mode Validation) test stubs - Add notes indicating tests moved to integration tests or deferred - Keeps unit test file focused on property-based unit tests Signed-off-by: Loki Ravi <lokravi@amazon.com>
Update sweep aggregate metadata example to match actual implementation: - Change 'num_values' to 'num_combinations' - Change 'num_successful_values' to 'num_successful_runs' - Change 'failed_values' to 'failed_runs' - Update failed_runs structure to use 'run_index' instead of 'value' - Remove 'timestamp' field (not in actual output) This aligns documentation with the actual sweep aggregate JSON output. Signed-off-by: Loki Ravi <lokravi@amazon.com>
Fix remaining inconsistencies in Pareto frontier documentation: - Add concurrency 20 to pareto_optimal array in JSON example - Update ASCII chart to show concurrency 20 as ● (optimal) not ○ (dominated) - Update chart caption to reflect all points are Pareto optimal - Add concurrency 20 to best practices list This completes the fix started in commit 4b1459a, ensuring all documentation consistently shows concurrency 20 as Pareto optimal. Signed-off-by: Loki Ravi <lokravi@amazon.com>
Rename three tests to accurately reflect their behavior: - test_parameter_sweep_mode_with_single_concurrency_raises_error → test_parameter_sweep_mode_with_single_concurrency_defaults_to_repeated - test_parameter_sweep_cooldown_with_single_concurrency_raises_error → test_parameter_sweep_cooldown_with_single_concurrency_defaults_to_zero - test_parameter_sweep_same_seed_with_single_concurrency_raises_error → test_parameter_sweep_same_seed_with_single_concurrency_defaults_to_false These tests verify that default values work correctly with single concurrency, not that errors are raised. The previous names were misleading and introduced in commit 4ddfc6f. Signed-off-by: Loki Ravi <lokravi@amazon.com>
Fix _create_confidence_strategy to respect user configuration instead of hardcoding parameters: - Use config.loadgen.set_consistent_seed instead of hardcoded True - Use config.loadgen.profile_run_disable_warmup_after_first instead of hardcoded True This ensures user settings are honored when creating confidence/fixed trials strategies, allowing users to control seed consistency and warmup behavior across multiple profile runs. Signed-off-by: Loki Ravi <lokravi@amazon.com>
Add None check before including sweep_aggregate in results dict to prevent AttributeError when sweep aggregation fails. When _compute_sweep_aggregates returns None (e.g., when parameter name cannot be determined from results), only include per_value_aggregates in the returned dict. This prevents downstream code from attempting to export or process a None sweep_aggregate, which would cause crashes. The function already logs a warning when returning None, so users are informed when sweep aggregation fails. Signed-off-by: Loki Ravi <lokravi@amazon.com>
Changed metadata updates in _compose_trials_then_sweep and _compose_sweep_then_trials to use dynamic parameter name from sweep_strategy.parameter_name instead of hardcoded 'concurrency'. This fixes issue where request_rate and other sweep parameters would fail or produce incorrect results due to hardcoded metadata keys. Fixes: ai-dynamo#35 Signed-off-by: Loki Ravi <lokravi@amazon.com>
Renamed three test functions to follow test_<function>_<scenario>_<expected> convention for better clarity: - test_single_value_remains_integer -> test_loadgen_concurrency_single_int_assignment_preserves_int_type - test_both_cooldowns_independent -> test_get_cooldown_seconds_independent_strategies_return_respective_values - test_zero_cooldown_default -> test_get_cooldown_seconds_no_cooldown_specified_returns_zero Addresses: ai-dynamo#49, ai-dynamo#50 Signed-off-by: Loki Ravi <lokravi@amazon.com>
Added 'text' and 'bash' language identifiers to fenced code blocks that were missing them, improving markdown linting compliance: - Directory structure blocks now use 'text' - Execution pattern diagrams now use 'text' - Pareto frontier ASCII chart now uses 'text' - Command examples now use 'bash' Addresses: ai-dynamo#38, ai-dynamo#41, ai-dynamo#51 Signed-off-by: Loki Ravi <lokravi@amazon.com>
Removed dead code - empty test class that had no test methods. The class was a placeholder for trend analysis tests that were removed when the analyze_trends() function was removed from design. Addresses: ai-dynamo#48 Signed-off-by: Loki Ravi <lokravi@amazon.com>
Replaced random.randint with Hypothesis data.draw() to ensure deterministic shrinking and reproducibility in property-based tests. The test now uses st.data() strategy to draw the insertion position, allowing Hypothesis to control all randomness and provide better shrinking when failures occur. Addresses: ai-dynamo#25, ai-dynamo#37, ai-dynamo#46 Signed-off-by: Loki Ravi <lokravi@amazon.com>
Removed outdated notes about needing CLI/cyclopts to populate model_fields_set. Updated docstrings to correctly explain that Pydantic populates model_fields_set automatically when passing fields to the LoadGeneratorConfig constructor. Addresses: ai-dynamo#42 Signed-off-by: Loki Ravi <lokravi@amazon.com>
Removed comments that simply restated what the code does (e.g., '# Aggregate results', '# Export aggregates', '# No aggregation needed', '# Confidence-only mode', '# Sweep + confidence mode'). The code structure and method names are self-documenting, making these comments unnecessary. The docstring already explains the method's purpose. Addresses: ai-dynamo#44 Signed-off-by: Loki Ravi <lokravi@amazon.com>
- Test auto-detection of execution modes (single, sweep, confidence, combined) - Test strategy composition in repeated and independent modes - Test execution order verification for composed strategies - Test cooldown application between runs - Test failed sweep value logging - Test strategy creation and validation - 15 tests covering orchestrator execution flow Signed-off-by: Loki Ravi <lokravi@amazon.com>
Migrate aggregation, export, tagging, and failure collection from orchestrator methods to strategy abstract methods. Each strategy (FixedTrials, ParameterSweep, SweepConfidence) now owns its full lifecycle. SweepConfidenceStrategy uses Option B (strategy owns its execution loop) for nested sweep+confidence iteration. Key changes: - Extract export_confidence/export_sweep to export_helpers module - Add abstract methods: aggregate, export_aggregates, tag_result, collect_failed_values, execute to ExecutionStrategy base class - Add SweepConfidenceStrategy with nested loop execution (Option B) - Add SweepMode enum for repeated/independent execution order - Reduce orchestrator from 1296 to 453 lines (thin coordinator) - Consolidate UI validation into single _validate_ui_compatibility that checks both is_multi_run and is_sweep - Remove duplicate dashboard UI check from _run_multi_benchmark - Add else guard in _run_multi_benchmark for impossible code path - Delete test_orchestrator_sweep_aggregation.py (covered by test_cli_runner_aggregation.py) - Remove 4 dead tests for deleted _collect_failed_sweep_values Signed-off-by: Loki Ravi <lokravi@amazon.com>
1. Strip parameter_sweep_cooldown_seconds and parameter_sweep_same_seed from inner subprocess config (ship-blocker: these fields caused every sweep run to fail with ValidationError when either flag was set) 2. Reject duplicate values in --concurrency list to prevent silent artifact overwrites (e.g., --concurrency 2,2,2 would write all runs to the same concurrency_2/ directory). Error message guides users to --num-profile-runs for variance at a single concurrency. 3. Remove duplicated banner separator in _run_multi_benchmark (rebase artifact that doubled the = separator line) 4. Rename SweepAggregation to SweepAnalyzer to avoid implying it conforms to the AggregationStrategy ABC (it uses a different protocol: static compute() vs instance aggregate()) 5. Fix artifact directory path containing Python list literal (concurrency[2, 4, 8]) by generating concurrency_sweep_2_4_8 for list inputs, avoiding shell-hostile brackets and spaces Signed-off-by: Loki Ravi <lokravi@amazon.com>
Implement comprehensive parameter sweeping feature that allows users to benchmark across multiple parameter values (concurrency, request-rate, or both) with automatic aggregation and comparison.
Key features:
Changes:
Tests:
Documentation:
Summary by CodeRabbit
New Features
CLI
Behavior Changes
Documentation
Tests
Chores