Skip to content

[dashboard] Add py-spy --idle and --subprocesses flags to profiling endpoints#63852

Merged
edoakes merged 4 commits into
masterfrom
marwan/py-spy-improvements
Jun 5, 2026
Merged

[dashboard] Add py-spy --idle and --subprocesses flags to profiling endpoints#63852
edoakes merged 4 commits into
masterfrom
marwan/py-spy-improvements

Conversation

@marwan116

@marwan116 marwan116 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Description

The dashboard profiling endpoints run py-spypy-spy record for the CPU Flame Graph (/task/cpu_profile, /worker/cpu_profile) and py-spy dump for the Stack Trace (/task/traceback, /worker/traceback) — but only ever exposed the native flag. This PR plumbs two more py-spy flags through the same paths (query param → request proto field → reporter agent handler → CpuProfilingManager):

  • idle=1py-spy record --idle (CPU Flame Graph). py-spy record samples only on-CPU (runnable) threads by default, so a worker blocked on a lock, I/O, or a CUDA sync shows up as near-idle and the flame graph is empty/misleading exactly when you need it most. --idle additionally captures off-CPU / sleeping threads, surfacing where a stalled worker is actually parked.
  • subprocesses=1py-spy --subprocesses (both CPU Flame Graph and Stack Trace). A Ray worker is normally a single process, but many workloads fork children that do the real work — PyTorch DataLoader(num_workers>0), multiprocessing pools, or multiproc inference backends (e.g. vLLM tensor-parallel workers). --subprocesses follows the worker's process tree so activity in those children appears in the profile / stack dump instead of being invisible.

Why idle is record-only but subprocesses covers both paths: py-spy dump already snapshots all threads (including off-CPU / blocked ones), so --idle is only meaningful for record. --subprocesses, however, is a real capability gap on the dump path — without it, child stacks can't be captured via the Stack Trace button — so it is plumbed through trace_dump as well.

All flags default off and, unlike --native (which Ray restricts to Linux), work on all platforms, so none are platform-gated.

Related issues

N/A

Additional information

  • Commits (stacked): (1) --idle on cpu_profile, (2) --subprocesses on cpu_profile, (3) --subprocesses on the traceback endpoints.
  • Docs: all flags are documented in the Dashboard profiling guide (optimize-performance.rst) — append &idle=1 and/or &subprocesses=1 to the relevant request URL.
  • Tests: test_profile_manager.py asserts each flag is appended to the constructed py-spy command iff requested (and absent by default), for both cpu_profile (record) and trace_dump (dump).
  • Caveat (subprocesses): py-spy discovers child processes by periodically scanning the process tree, so very short-lived subprocesses may be missed; persistent workers (DataLoader, vLLM) are captured reliably.
  • The generated reporter_pb2 bindings are gitignored and regenerated in CI, where the agent/head code paths and these tests run.

🤖 Generated with Claude Code

marwan116 and others added 2 commits June 4, 2026 07:05
The dashboard CPU profiling endpoints (/task/cpu_profile and
/worker/cpu_profile) run `py-spy record`, which by default only samples
on-CPU (runnable) threads. py-spy's `--idle` flag additionally includes
off-CPU / sleeping threads (blocked on locks, I/O, CUDA syncs), which is
essential for diagnosing a stalled server.

This plumbs an `idle` query parameter through the same path the existing
`native` flag uses: the `idle=1` query param on the endpoint -> the
`idle` field on the CpuProfilingRequest proto -> the reporter agent
handler -> CpuProfilingManager.cpu_profile, which appends `--idle` to the
py-spy command. Unlike `--native` (linux-only), `--idle` is supported on
all platforms and is therefore not platform-gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
py-spy can profile a process together with its child processes via the
`--subprocesses` flag. This is useful for Ray workers that fork children
that do the real work -- e.g. PyTorch DataLoader workers, multiprocessing
pools, or multiproc inference backends (vLLM tensor-parallel workers) --
where the parent worker is mostly idle and the interesting CPU activity
lives in subprocesses the flamegraph would otherwise miss.

This stacks on the `idle` change and plumbs a `subprocesses` query
parameter through the same path as `native`/`idle`: `subprocesses=1` on
the endpoint -> the `subprocesses` field on the CpuProfilingRequest proto
-> the reporter agent handler -> CpuProfilingManager.cpu_profile, which
appends `--subprocesses` to the py-spy command. Like `--idle`, it is
supported on all platforms and is not platform-gated.

Because py-spy discovers child processes by periodically scanning the
process tree, very short-lived subprocesses may be missed; persistent
workers (DataLoader, vLLM) are captured reliably.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
@marwan116 marwan116 requested review from a team as code owners June 4, 2026 14:19

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the --idle (to include off-CPU/sleeping threads) and --subprocesses (to profile child processes) flags in the CPU profiling feature using py-spy. It updates the protobuf definitions, the dashboard reporter agent and head, the profiling manager, the documentation, and adds corresponding unit tests. There are no review comments, so I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@dstrodtman dstrodtman left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamp for docs

py-spy's `dump` subcommand supports `--subprocesses` just like `record`,
but the dashboard's "Stack Trace" path (GetTraceback -> trace_dump) only
ever passed `--native`, so child-process stacks could not be captured
there. This mirrors the cpu_profile change onto the traceback path so a
worker that forks children (PyTorch DataLoader, multiprocessing, or vLLM
tensor-parallel workers) can have those child stacks dumped too.

A `subprocesses` query parameter on the `/task/traceback` and
`/worker/traceback` endpoints is plumbed through the `subprocesses` field
on the GetTracebackRequest proto and the GetTraceback agent handler into
CpuProfilingManager.trace_dump, which appends `--subprocesses` to the
`py-spy dump` command. Like the other flags, it is not platform-gated.

Note the asymmetry with `--idle`: idle is only meaningful for `record`
(py-spy `dump` already snapshots all threads, including off-CPU ones),
whereas `--subprocesses` is a genuine capability gap on the dump path --
hence this follow-up extends the traceback path too.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
@marwan116 marwan116 changed the title [dashboard] Add --idle and --subprocesses flags to cpu_profile endpoint Jun 4, 2026

@kshanmol kshanmol left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this improvement!

@sampan-s-nayak sampan-s-nayak added the go add ONLY when ready to merge, run all tests label Jun 4, 2026
@sampan-s-nayak sampan-s-nayak enabled auto-merge (squash) June 4, 2026 19:21
@ray-gardener ray-gardener Bot added docs An issue or change related to documentation core Issues that should be addressed in Ray Core observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Jun 4, 2026
@github-actions github-actions Bot disabled auto-merge June 5, 2026 13:05
@edoakes edoakes merged commit c5f985e into master Jun 5, 2026
5 of 6 checks passed
@edoakes edoakes deleted the marwan/py-spy-improvements branch June 5, 2026 16:25
limarkdcunha pushed a commit to limarkdcunha/ray that referenced this pull request Jun 30, 2026
…ndpoints (ray-project#63852)

The dashboard profiling endpoints run `py-spy` — `py-spy record` for the
**CPU Flame Graph** (`/task/cpu_profile`, `/worker/cpu_profile`) and
`py-spy dump` for the **Stack Trace** (`/task/traceback`,
`/worker/traceback`) — but only ever exposed the `native` flag. This PR
plumbs two more py-spy flags through the same paths (query param →
request proto field → reporter agent handler → `CpuProfilingManager`):

- **`idle=1` → `py-spy record --idle`** (CPU Flame Graph). `py-spy
record` samples only on-CPU (runnable) threads by default, so a worker
blocked on a lock, I/O, or a CUDA sync shows up as near-idle and the
flame graph is empty/misleading exactly when you need it most. `--idle`
additionally captures off-CPU / sleeping threads, surfacing where a
stalled worker is actually parked.
- **`subprocesses=1` → `py-spy --subprocesses`** (both CPU Flame Graph
*and* Stack Trace). A Ray worker is normally a single process, but many
workloads fork children that do the real work — PyTorch
`DataLoader(num_workers>0)`, `multiprocessing` pools, or multiproc
inference backends (e.g. vLLM tensor-parallel workers). `--subprocesses`
follows the worker's process tree so activity in those children appears
in the profile / stack dump instead of being invisible.

**Why `idle` is record-only but `subprocesses` covers both paths:**
`py-spy dump` already snapshots *all* threads (including off-CPU /
blocked ones), so `--idle` is only meaningful for `record`.
`--subprocesses`, however, is a real capability gap on the dump path —
without it, child stacks can't be captured via the Stack Trace button —
so it is plumbed through `trace_dump` as well.

All flags default off and, unlike `--native` (which Ray restricts to
Linux), work on all platforms, so none are platform-gated.

- **Commits** (stacked): (1) `--idle` on `cpu_profile`, (2)
`--subprocesses` on `cpu_profile`, (3) `--subprocesses` on the traceback
endpoints.
- **Docs:** all flags are documented in the Dashboard profiling guide
(`optimize-performance.rst`) — append `&idle=1` and/or `&subprocesses=1`
to the relevant request URL.
- **Tests:** `test_profile_manager.py` asserts each flag is appended to
the constructed py-spy command iff requested (and absent by default),
for both `cpu_profile` (record) and `trace_dump` (dump).
- **Caveat (subprocesses):** py-spy discovers child processes by
periodically scanning the process tree, so very short-lived subprocesses
may be missed; persistent workers (DataLoader, vLLM) are captured
reliably.
- The generated `reporter_pb2` bindings are gitignored and regenerated
in CI, where the agent/head code paths and these tests run.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core docs An issue or change related to documentation go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling

6 participants