Skip to content

[Core] Fix ray stop failing to terminate dashboard and runtime_env agents on Windows#62428

Merged
edoakes merged 4 commits into
ray-project:masterfrom
Miso-A1:fix/issue-61452
Apr 11, 2026
Merged

[Core] Fix ray stop failing to terminate dashboard and runtime_env agents on Windows#62428
edoakes merged 4 commits into
ray-project:masterfrom
Miso-A1:fix/issue-61452

Conversation

@Miso-A1

@Miso-A1 Miso-A1 commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

Description

Fix ray stop on Windows failing to terminate dashboard agent (ray/dashboard/agent.py) and runtime_env agent (ray/_private/runtime_env/agent/main.py) processes.

On Windows, setproctitle does not change the process name or command line visible to psutil — it only creates a named Windows event object (see spt_status.c). This means the existing RAY_PROCESSES entries using "ray::DashboardAgent" and "ray::RuntimeEnvAgent" keywords never match any process during ray stop, leaving these agents orphaned. The command exits with code 0 and reports "Stopped all N Ray processes" while agents continue running in the background, holding file handles and blocking cleanup operations.

This PR adds Windows-only fallback entries to RAY_PROCESSES that match the actual script paths ("dashboard/agent.py" and "runtime_env/agent/main.py") in the process command line. These entries are guarded by sys.platform == "win32" to avoid double-matching on Linux/macOS where setproctitle works correctly. A unit test is included that verifies the matching logic works for simulated Windows process command lines.

Changes

  • python/ray/autoscaler/_private/constants.py: Add two conditional entries to RAY_PROCESSES for Windows that match agents by script path instead of proctitle.
  • python/ray/tests/test_basic_4.py: Add test_ray_processes_can_match_agents_on_windows to verify the fallback entries are present on Windows and correctly match simulated agent command lines.

Related issues

Fixes #61452

Additional information

Root cause

The process-matching logic in scripts.py kill_procs() searches for agent processes by checking if keywords like "ray::DashboardAgent" appear in subprocess.list2cmdline(proc.cmdline()). On Linux/macOS, setproctitle() changes the process cmdline so this works. On Windows, setproctitle() only creates a named event object (CreateEvent) — psutil.Process.cmdline() still returns the original invocation (e.g., python.exe .../dashboard/agent.py --port ...), so the keyword never matches.

Why not change the existing entries?

The existing "ray::DashboardAgent" / "ray::RuntimeEnvAgent" entries work correctly on Linux and macOS. Replacing them with script-path matching would change behavior on those platforms. Adding platform-conditional fallback entries is the minimal, safe fix.

Testing

  • Unit test patches sys.platform to "win32", reloads the constants module, and verifies that simulated Windows agent command lines are matched.
  • Verified that RAY_PROCESSES invariants are preserved (raylet first, gcs_server last).
  • Verified no false positives on unrelated processes.
  • Linux/macOS behavior is unchanged (fallback entries are not included).
@Miso-A1 Miso-A1 requested a review from a team as a code owner April 8, 2026 15:16

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue on Windows where setproctitle fails to update process names, preventing the autoscaler from correctly identifying dashboard and runtime environment agents. The changes introduce fallback entries in RAY_PROCESSES that match the actual script paths on Windows and add a corresponding unit test. The reviewer suggested that substring matching might be fragile due to path separator inconsistencies and recommended using a try...finally block or a pytest fixture for the module reloading logic in the test to ensure global state restoration.

Comment thread python/ray/autoscaler/_private/constants.py
Comment thread python/ray/tests/test_basic_4.py Outdated
Comment on lines +272 to +282
import importlib
import ray.autoscaler._private.constants as constants_mod

# Reload with sys.platform patched to "win32" so that the conditional
# fallback entries are included in RAY_PROCESSES.
with mock.patch("sys.platform", "win32"):
importlib.reload(constants_mod)
win32_ray_processes = list(constants_mod.RAY_PROCESSES)

# Restore the module to its original state.
importlib.reload(constants_mod)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of importlib.reload on a central module like constants can have side effects if other modules have already imported RAY_PROCESSES (as they will retain a reference to the old list object). While this is acceptable for an isolated unit test, it's worth noting that the second reload (line 282) is crucial to restore the global state for any subsequent tests in the same process. To make this more robust against test failures, consider wrapping the reload logic in a try...finally block or using a pytest fixture with a teardown.

@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Apr 8, 2026
Miso-A1 added 2 commits April 8, 2026 19:21
…agents on Windows (ray-project#61452)

On Windows, `setproctitle` does not change the process name or command
line visible to psutil — it only creates a named Windows event object.
This means the existing `RAY_PROCESSES` entries using
"ray::DashboardAgent" and "ray::RuntimeEnvAgent" keywords never match
any process, leaving these agents orphaned after `ray stop`.

Add Windows-only fallback entries that match the actual script paths
("dashboard/agent.py" and "runtime_env/agent/main.py") in the command
line. These are guarded by `sys.platform == "win32"` to avoid
double-matching on Linux/macOS where setproctitle works correctly.

Signed-off-by: jamesrayammons <jamesrayammons@outlook.com>
Signed-off-by: jamesrayammons <jamesrayammons@outlook.com>
@ray-gardener ray-gardener Bot added windows core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Apr 8, 2026
Comment thread python/ray/tests/test_basic_4.py Outdated
assert len(ready) == 1000, len(ready)


def test_ray_processes_can_match_agents_on_windows():

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is not really that useful. instead, could we add an integration test that runs on windows that actually runs ray start and ray stop and ensures that these processes exit? it could live in test_cli.py and you can mark it to only run on windows using pytest.mark.skipif(sys.platform != "win32") (see other examples in the repo)

Move the agent-matching test from test_basic_4.py to test_cli.py as a
proper integration test that runs `ray start` / `ray stop` and verifies
the dashboard and runtime_env agent processes actually exit. The test is
marked with skipif(sys.platform != "win32") so it only runs on Windows.

Signed-off-by: jamesrayammons <jamesrayammons@outlook.com>
@Miso-A1

Miso-A1 commented Apr 10, 2026

Copy link
Copy Markdown
Contributor Author

Hi @edoakes thanks for your feedback. I've solved, please check it.

@edoakes edoakes left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triggered the windows CI tests, will merge if/when those pass

@edoakes edoakes merged commit 7b6dd9c into ray-project:master Apr 11, 2026
6 checks passed
@edoakes

edoakes commented Apr 11, 2026

Copy link
Copy Markdown
Collaborator

Thanks for the contribution @jamesrayammons

HLDKNotFound pushed a commit to chichic21039/ray that referenced this pull request Apr 22, 2026
…agents on Windows (ray-project#62428)

Fix `ray stop` on Windows failing to terminate dashboard agent
(`ray/dashboard/agent.py`) and runtime_env agent
(`ray/_private/runtime_env/agent/main.py`) processes.

On Windows, `setproctitle` does not change the process name or command
line visible to `psutil` — it only creates a named Windows event object
(see `spt_status.c`). This means the existing `RAY_PROCESSES` entries
using `"ray::DashboardAgent"` and `"ray::RuntimeEnvAgent"` keywords
never match any process during `ray stop`, leaving these agents
orphaned. The command exits with code 0 and reports "Stopped all N Ray
processes" while agents continue running in the background, holding file
handles and blocking cleanup operations.

This PR adds Windows-only fallback entries to `RAY_PROCESSES` that match
the actual script paths (`"dashboard/agent.py"` and
`"runtime_env/agent/main.py"`) in the process command line. These
entries are guarded by `sys.platform == "win32"` to avoid
double-matching on Linux/macOS where `setproctitle` works correctly. A
unit test is included that verifies the matching logic works for
simulated Windows process command lines.

### Changes
- **`python/ray/autoscaler/_private/constants.py`**: Add two conditional
entries to `RAY_PROCESSES` for Windows that match agents by script path
instead of proctitle.
- **`python/ray/tests/test_basic_4.py`**: Add
`test_ray_processes_can_match_agents_on_windows` to verify the fallback
entries are present on Windows and correctly match simulated agent
command lines.

## Related issues

Fixes ray-project#61452

## Additional information

### Root cause

The process-matching logic in `scripts.py` `kill_procs()` searches for
agent processes by checking if keywords like `"ray::DashboardAgent"`
appear in `subprocess.list2cmdline(proc.cmdline())`. On Linux/macOS,
`setproctitle()` changes the process cmdline so this works. On Windows,
`setproctitle()` only creates a named event object (`CreateEvent`) —
`psutil.Process.cmdline()` still returns the original invocation (e.g.,
`python.exe .../dashboard/agent.py --port ...`), so the keyword never
matches.

### Why not change the existing entries?

The existing `"ray::DashboardAgent"` / `"ray::RuntimeEnvAgent"` entries
work correctly on Linux and macOS. Replacing them with script-path
matching would change behavior on those platforms. Adding
platform-conditional fallback entries is the minimal, safe fix.

### Testing

- Unit test patches `sys.platform` to `"win32"`, reloads the constants
module, and verifies that simulated Windows agent command lines are
matched.
- Verified that `RAY_PROCESSES` invariants are preserved (raylet first,
gcs_server last).
- Verified no false positives on unrelated processes.
- Linux/macOS behavior is unchanged (fallback entries are not included).

---------

Signed-off-by: jamesrayammons <jamesrayammons@outlook.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
…agents on Windows (ray-project#62428)

Fix `ray stop` on Windows failing to terminate dashboard agent
(`ray/dashboard/agent.py`) and runtime_env agent
(`ray/_private/runtime_env/agent/main.py`) processes.

On Windows, `setproctitle` does not change the process name or command
line visible to `psutil` — it only creates a named Windows event object
(see `spt_status.c`). This means the existing `RAY_PROCESSES` entries
using `"ray::DashboardAgent"` and `"ray::RuntimeEnvAgent"` keywords
never match any process during `ray stop`, leaving these agents
orphaned. The command exits with code 0 and reports "Stopped all N Ray
processes" while agents continue running in the background, holding file
handles and blocking cleanup operations.

This PR adds Windows-only fallback entries to `RAY_PROCESSES` that match
the actual script paths (`"dashboard/agent.py"` and
`"runtime_env/agent/main.py"`) in the process command line. These
entries are guarded by `sys.platform == "win32"` to avoid
double-matching on Linux/macOS where `setproctitle` works correctly. A
unit test is included that verifies the matching logic works for
simulated Windows process command lines.

### Changes
- **`python/ray/autoscaler/_private/constants.py`**: Add two conditional
entries to `RAY_PROCESSES` for Windows that match agents by script path
instead of proctitle.
- **`python/ray/tests/test_basic_4.py`**: Add
`test_ray_processes_can_match_agents_on_windows` to verify the fallback
entries are present on Windows and correctly match simulated agent
command lines.

## Related issues

Fixes ray-project#61452

## Additional information

### Root cause

The process-matching logic in `scripts.py` `kill_procs()` searches for
agent processes by checking if keywords like `"ray::DashboardAgent"`
appear in `subprocess.list2cmdline(proc.cmdline())`. On Linux/macOS,
`setproctitle()` changes the process cmdline so this works. On Windows,
`setproctitle()` only creates a named event object (`CreateEvent`) —
`psutil.Process.cmdline()` still returns the original invocation (e.g.,
`python.exe .../dashboard/agent.py --port ...`), so the keyword never
matches.

### Why not change the existing entries?

The existing `"ray::DashboardAgent"` / `"ray::RuntimeEnvAgent"` entries
work correctly on Linux and macOS. Replacing them with script-path
matching would change behavior on those platforms. Adding
platform-conditional fallback entries is the minimal, safe fix.

### Testing

- Unit test patches `sys.platform` to `"win32"`, reloads the constants
module, and verifies that simulated Windows agent command lines are
matched.
- Verified that `RAY_PROCESSES` invariants are preserved (raylet first,
gcs_server last).
- Verified no false positives on unrelated processes.
- Linux/macOS behavior is unchanged (fallback entries are not included).

---------

Signed-off-by: jamesrayammons <jamesrayammons@outlook.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests windows

2 participants