fix(ci): export MALLOC_ARENA_MAX=2 before pytest for component_integration by ajcasagrande · Pull Request #950 · ai-dynamo/aiperf

ajcasagrande · 2026-05-16T00:38:15Z

Summary

Ubuntu CI (3.11/3.12/3.13) for unit + component_integration tests has been failing since feat: YAML-native v2 config + adaptive sweep orchestrator with BO & search recipes #912; macOS is fine. Symptom: ~66/83 systematic FAILEDs the moment any test calls app(...) in-process, ending in an xdist INTERNALERROR: list.remove(x): x not in list (the signature of a worker dying mid-run).
Root cause: os.environ.setdefault("MALLOC_ARENA_MAX", "2") at tests/component_integration/conftest.py:33 is a no-op for the running pytest worker — glibc reads MALLOC_ARENA_MAX once at process startup, before Python imports run. Component_integration runs aiperf in-process (no subprocesses to inherit the var), so setting it from inside the conftest never actually capped arenas.
feat: YAML-native v2 config + adaptive sweep orchestrator with BO & search recipes #912 pulled in BoTorch / Optuna / scipy / torch, which pushed the working set past what glibc's default 8×NCPU arenas can fit on the 2-CPU GitHub runners → xdist workers crashed.

Fix

Prepend MALLOC_ARENA_MAX=2 to the four pytest invocations in Makefile that target tests/component_integration/ (test-ci, test-component-integration, test-component-integration-ci, test-component-integration-verbose), so the var is in the shell env before pytest forks workers. Rewrite the misleading conftest comment to point at the Makefile.

tests/integration/conftest.py is left alone — that suite spawns aiperf as a subprocess, and the conftest setdefault correctly propagates the var to children.

Test plan

MALLOC_ARENA_MAX=2 uv run pytest tests/component_integration/cli/ -m component_integration -n auto → 15/15 passing locally
make -n test-ci and make -n test-component-integration-ci show MALLOC_ARENA_MAX=2 in the expanded pytest command
CI on this PR shows Ubuntu jobs green

Summary by CodeRabbit

Chores
- Adjusted test-run environment to set a memory-arena limit and removed automatic parallel test-worker flag for CI component tests.
- Reduced dev extras by removing one optional sub-dependency.
Tests
- Reclassified multiple component/integration tests with slow/stress markers to refine test selection and prioritization.
- Consolidated subprocess-handling into a shared test harness helper for consistent test process behavior.

…integration glibc reads MALLOC_ARENA_MAX at process startup, so setting it from inside tests/component_integration/conftest.py via os.environ.setdefault was a no-op for the running pytest worker — by then glibc had already initialized its arenas. Component_integration runs aiperf in-process (no subprocesses to inherit the var), so the only effective place to set it is the shell env before pytest starts. Prepend MALLOC_ARENA_MAX=2 to the four pytest invocations in the Makefile that target tests/component_integration/ (test-ci, test-component-integration, test-component-integration-ci, test-component-integration-verbose), and rewrite the conftest comment to reflect reality. This unbroke Ubuntu CI (3.11/3.12/3.13). macOS was unaffected because it uses a different allocator. The latent issue surfaced when #912 pulled in BoTorch/Optuna/scipy/torch and pushed the working set past what glibc's default 8×NCPU arenas could fit on 2-CPU runners, crashing xdist workers (visible as 66/83 systematic FAILED + final xdist INTERNALERROR "list.remove(x): x not in list"). Note: tests/integration/conftest.py keeps its setdefault — that suite spawns aiperf as subprocesses, so the var correctly propagates to children. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

github-actions · 2026-05-16T00:38:23Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@88f0bac66c01f3e02734f9c054a64f9715b223c5

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@88f0bac66c01f3e02734f9c054a64f9715b223c5

Last updated for commit: 88f0bac • Browse code

coderabbitai · 2026-05-16T00:39:32Z

Warning

Rate limit exceeded

@ajcasagrande has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 52 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c71bf665-052a-451d-8f29-a73fb061f5d2

📥 Commits

Reviewing files that changed from the base of the PR and between ae0358e and 88f0bac.

📒 Files selected for processing (19)

.github/workflows/run-unit-tests.yml
Makefile
docs/environment-variables.md
pyproject.toml
src/aiperf/common/environment.py
src/aiperf/common/tokenizer_validator.py
src/aiperf/config/sweep/expand_qmc.py
tests/component_integration/conftest.py
tests/unit/common/test_tokenizer_validator.py
tests/unit/config/test_sweep_qmc.py
tests/unit/config/test_sweep_qmc_properties.py
tests/unit/config/test_sweep_round3_fixes.py
tests/unit/dataset/agentic_code_gen/test_roundtrip.py
tests/unit/orchestrator/search_planner/test_optuna_dsp_kernel.py
tests/unit/orchestrator/test_execute_adaptive_search.py
tests/unit/search_recipes/test_recipes_round3.py
tests/unit/timing/test_branch_orchestrator.py
tests/unit/timing/test_branch_orchestrator_adversarial.py
tests/unit/timing/test_branch_orchestrator_adversarial_full.py

Walkthrough

This PR sets MALLOC_ARENA_MAX=2 for component-integration pytest runs, removes a -n auto xdist flag from test-ci, clarifies the conftest comment about Makefile precedence, moves a subprocess process-group helper into tests.harness.subprocess (and imports it), updates one dev extra in pyproject, and reclassifies several tests with slow/stress/integration pytest markers.

Changes

Memory Arena Configuration for Component Integration Tests

Layer / File(s)	Summary
Makefile pytest environment variable `Makefile`	`test-ci`, `test-component-integration`, `test-component-integration-ci`, and `test-component-integration-verbose` now prefix component-integration pytest invocations with `MALLOC_ARENA_MAX=2` (and `test-ci` drops `-n auto` where previously present).
Configuration documentation `tests/component_integration/conftest.py`	Clarified comment: glibc reads `MALLOC_ARENA_MAX` at process startup and the Makefile export is authoritative; the in-file `os.environ.setdefault` is a fallback.

Dev dependency extras

Layer / File(s)	Summary
pyproject dev extras update `pyproject.toml`	`project.optional-dependencies.dev` now references `aiperf[mlflow,otel]` (removed `botorch` extra).

Subprocess process-group helper consolidation

Layer / File(s)	Summary
Add harness helper `tests/harness/subprocess.py`	Adds `_new_process_group_kwargs()` to detect `subprocess.Popen` `process_group` support and return either `{"process_group": 0}` or `{"start_new_session": True}`.
Integration conftest import `tests/integration/conftest.py`	Stops defining `_new_process_group_kwargs` locally, imports it from `tests.harness.subprocess`, and removes the local `inspect` dependency.
Consumer test import updates `tests/component_integration/test_process_title.py`	Updated import to obtain `_new_process_group_kwargs` from `tests.harness.subprocess` instead of `tests.integration.conftest`.

Pytest marker recategorization

Layer / File(s)	Summary
Add slow/stress markers and reclassify tests `tests/component_integration/dataset/test_dataset_sampling.py`, `tests/component_integration/test_multi_objective_e2e.py`, `tests/component_integration/test_search_e2e.py`, `tests/component_integration/timing/test_advanced_scenarios.py`, `tests/component_integration/timing/test_dag_timing_pathology.py`, `tests/integration/plot/test_plot_mlflow_upload.py`, `tests/integration/post_processors/test_mlflow_live_correctness.py`, `tests/integration/post_processors/test_otel_live_export.py`, `tests/integration/post_processors/test_otel_payload_content.py`	Added `@pytest.mark.slow` or `@pytest.mark.stress` decorators and replaced several `@pytest.mark.component_integration` markers with `@pytest.mark.integration` plus `@pytest.mark.slow` where shown; no test logic or assertions changed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nudge the Makefile, set arenas to two,
A harness helper hops into view,
Tests get re-marked, slow and steady pace,
Dev extras pruned with quiet grace,
A rabbit's cheer for a tidier test-roo!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 68.75% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely summarizes the main fix: exporting MALLOC_ARENA_MAX=2 before pytest for component_integration tests. It directly addresses the primary change across the Makefile modifications.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/component_integration/conftest.py`:
- Line 31: Replace the ambiguous multiplication character in the comment that
reads "8×NCPU" with a plain ASCII 'x' so it reads "8xNCPU" to avoid Ruff RUF003;
update the comment text in the same location (the comment containing "default
8×NCPU arenas blows out RAM in 2-CPU CI runners. glibc reads this") accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e1ca39c3-7961-4879-9f90-32b364cabb94

📥 Commits

Reviewing files that changed from the base of the PR and between bac953d and 2cf5b4c.

📒 Files selected for processing (2)

Makefile
tests/component_integration/conftest.py

Duplicate approval (review lock race condition — now fixed)

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

…tall GitHub-hosted ubuntu-latest is the lowest-spec runner in our pool (4 vCPU / 16 GB) and was rate-limiting unit-test wall time. Move the Linux row of run-unit-tests onto the self-hosted prod-aiperf-builder pool, add an installed-deps cache, and trim the install step to only what CI actually needs. - run-unit-tests.yml: - Linux row -> prod-aiperf-builder-amd-v1 (macOS stays on macos-latest -- no macOS in the self-hosted pool). - Add astral-sh/setup-uv@v5 with enable-cache + uv.lock as the cache key. Cold pod install drops from ~8 min to ~2-3 min on warm-cache runs. - Switch install step from make first-time-setup to make ci-install (skips pre-commit hooks and redundant install-mock-server). - Set PYTEST_XDIST_AUTO_NUM_WORKERS=4 on the test step. The builder pod exposes 48 vCPUs but is RAM-capped at 6 GiB; -n auto without the cap would OOM-kill the runner agent. - timeout-minutes: 30 at job level (defensive cap). - Update the Codecov-upload condition to match the new matrix key. - Makefile: add a ci-install target -- venv + project install + plugin enum/overload generation. Skips pre-commit install --install-hooks and the duplicate install-mock-server that first-time-setup carries. Component-integration test failures already on main since 94a9102 are a separate fix. Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

PR #912 (commit 94a9102) rewrote tokenizer_validator.py and introduced a ProcessPoolExecutor-based HF cache prefetch in validate_tokenizer_early. The skip-prefetch gate ANDed HF_HUB_OFFLINE and TRANSFORMERS_OFFLINE, but the component_integration conftest only set HF_HUB_OFFLINE, so the gate never engaged and the prefetch subprocesses ran. Those subprocesses bypass the in-process Tokenizer.from_pretrained patch (subprocess re-imports HF cleanly) and try to write to the real HF cache. In restricted environments (Linux CI containers, sandboxes) or under concurrent test execution that races on the cache directory, the write fails with EPERM and aiperf aborts with "Configuration resolution failed: [Errno 1] Operation not permitted" -- which surfaced as `request_count == 0` in every component_integration test that runs `aiperf profile ...` on Linux CI (ubuntu-latest and the new builder pool). This has been the cause of run-unit-tests failures on main since 2026-05-15. - src/aiperf/common/tokenizer_validator.py: change the skip-prefetch gate from AND to OR. Either env var being set is enough -- both mean "I have a warm cache, do not touch the network/disk." Requiring both was overly conservative and is what masked the test-harness bug. - tests/component_integration/conftest.py: hf_offline_mode fixture now sets both HF_HUB_OFFLINE and TRANSFORMERS_OFFLINE so the prefetch skip-gate engages even on older aiperf builds where the gate is still ANDed. Restoring prior values for both on teardown. Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

cleanup-leak tests in tests/unit/timing/test_branch_orchestrator{,_adversarial,_adversarial_full}.py called caplog.at_level(WARNING, logger="aiperf.timing.branch_orchestrator"), but the warnings they're asserting on are emitted from the sibling module aiperf.timing._branch_orchestrator_logging (BranchOrchestratorLoggingMixin was split into its own file with logger=logging.getLogger(__name__)). On most setups this still happened to work because pytest's caplog catches at root and propagation from the sibling logger covered for the wrong target -- but it's fragile: any environment or earlier test that bumps the sibling logger's effective level above WARNING silently drops the record before caplog sees it. That's what was hitting Linux/Python 3.13 on the builder pool runs of PR #955 (build (linux, 3.13)): FAILED test_cleanup_emits_leak_warning_when_state_nonempty - assert 0 == 1 FAILED test_cleanup_with_pending_joins_logs_leak_warning - assert 0 == 1 FAILED test_cleanup_clears_pre_dispatched_and_logs_leak - assert 0 == 1 Point caplog at the actual emitting logger (aiperf.timing._branch_orchestrator_logging) so the level gate engages where it matters. Fixes all 5 occurrences across the 3 files. Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

- actions/checkout v4 -> v6 - actions/setup-python v5 -> v6 - astral-sh/setup-uv v5 -> v8 Latest stable major versions as of 2026-05-19. Other workflows (nightly.yml, fern-docs.yml, etc.) still pin older majors; updating those is out of scope for this PR. Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

Previous bump to @v8 broke the job: astral-sh/setup-uv publishes v8.0.0 and v8.1.0 as exact tags but has NOT yet published a floating v8 major tag (v1..v7 exist; v8 does not). @v8 resolves to nothing. Pin to v7 to match the major-only style of the other actions in this workflow. Switch to v8 once astral-sh publishes the floating v8 tag. Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

Move shared subprocess process-group setup into the test harness and mark expensive tests so CI can run the intended suite split reliably. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

Skip tokenizer preloading in component integration tests and keep forked process-pool children from inheriting the pytest os._exit shim, while preserving CI xdist coverage and Sobol warning assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

Component integration tests now skip tokenizer preload, so the process-pool reproduction is no longer needed; keep the os._exit harness isolation itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

…ner' into ajc/fix-tests

Move measured expensive unit tests behind slow/stress markers and register the existing network marker so default unit runs stay focused. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

github-actions · 2026-05-19T04:32:40Z

Fern Docs Preview: https://nvidia-preview-a6988ffb-b02b-4f94-9dfa-695d505210fd.docs.buildwithfern.com/aiperf/dev

codecov · 2026-05-19T04:41:00Z

Codecov Report

❌ Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/aiperf/common/tokenizer_validator.py	80.00%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

…ation (#950) Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot added the fix label May 16, 2026

ajcasagrande enabled auto-merge (squash) May 16, 2026 00:39

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

Comment thread tests/component_integration/conftest.py

dynamo-ops approved these changes May 16, 2026

View reviewed changes

dynamo-ops previously approved these changes May 16, 2026

View reviewed changes

ajcasagrande added 2 commits May 18, 2026 09:54

remove botorch

4aaa757

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

Merge branch 'main' into ajc/fix-tests

4406499

debermudez approved these changes May 18, 2026

View reviewed changes

saturley-hall and others added 11 commits May 18, 2026 21:33

Merge remote-tracking branch 'origin/main' into ajc/fix-tests

f8852d3

Merge remote-tracking branch 'origin/harrison/switch-failing-test-run…

5a81996

…ner' into ajc/fix-tests

ajcasagrande merged commit 93c7f1e into main May 19, 2026
29 checks passed

ajcasagrande deleted the ajc/fix-tests branch May 19, 2026 04:33

ajcasagrande mentioned this pull request May 19, 2026

ci: builder pool, uv cache, ci-install target, and tokenizer prefetch fix #955

Closed

5 tasks

ajcasagrande mentioned this pull request May 19, 2026

feat: Cherry-pick YAML config, CI malloc, and test selection fixes to release/0.9.0 #958

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(ci): export MALLOC_ARENA_MAX=2 before pytest for component_integration#950

fix(ci): export MALLOC_ARENA_MAX=2 before pytest for component_integration#950
ajcasagrande merged 14 commits into
mainfrom
ajc/fix-tests

ajcasagrande commented May 16, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 16, 2026 •

edited

Loading

coderabbitai Bot commented May 16, 2026 •

edited

Loading

Rate limit exceeded

❌ Failed checks (1 warning)

coderabbitai Bot left a comment

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

codecov Bot commented May 19, 2026

Labels

4 participants

Uh oh!

Conversation

ajcasagrande commented May 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Test plan

Summary by CodeRabbit

github-actions Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Try out this PR

coderabbitai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

codecov Bot commented May 19, 2026

Codecov Report

Labels

4 participants

ajcasagrande commented May 16, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 16, 2026 •

edited

Loading

coderabbitai Bot commented May 16, 2026 •

edited

Loading