[Serve][3/n] Deployment-scoped actor lifecycle and deferred replica creation by abrarsheikh · Pull Request #61664 · ray-project/ray

abrarsheikh · 2026-03-11T22:38:52Z

Introduces lifecycle management for deployment-scoped actors and defers replica creation until those actors are ready.

Changes

Deployment-scoped actor lifecycle: Adds DeploymentActorWrapper and DeploymentActorContainer to manage deployment actors (start, readiness checks, stop) with STARTING / RUNNING states.
Deferred replica creation: When deployment_actors is configured, replicas are created only after all deployment actors are ready. This avoids starting replicas before shared actors (e.g., model caches, state stores) are available.
Recovery: On controller restart, existing deployment actors are recovered via _recover_deployment_actors() instead of recreating them.
Status handling: Deployment actor startup failures are surfaced via DeploymentStatus.DEPLOY_FAILED with DEPLOYMENT_ACTOR_FAILED as the trigger.

Signed-off-by: abrar <abrar@anyscale.com>

gemini-code-assist

Code Review

This pull request introduces a significant new feature: deployment-scoped actors with lifecycle management and deferred replica creation. The implementation is extensive, covering actor creation, recovery, and failure handling. The changes are well-structured and include comprehensive tests. I've identified two main areas for improvement. First, the failure handling for deployment actors could be more efficient; currently, a single actor failure causes all actors for that version to be recreated. Second, the automatic restart policy for these actors could be risky for stateful use cases, as it might lead to silent state loss on crashes. Addressing these points would enhance the robustness and performance of this new feature.

gemini-code-assist · 2026-03-11T22:41:40Z

+            )
+            if merged_runtime_env:
+                actor_options["runtime_env"] = merged_runtime_env
+            actor_options["max_restarts"] = -1


Setting max_restarts to -1 for deployment-scoped actors is risky, especially for stateful actors like caches or state stores as described in the pull request. If an actor crashes after it has become ready, Ray will restart it, but its internal state will be lost. The Serve controller currently does not seem to monitor the health of ready deployment actors, so this state loss can happen silently, leading to inconsistent application behavior.

Consider setting max_restarts to 0 and implementing a mechanism in the controller to detect actor failure and recreate it. Alternatively, this behavior and its implications for stateful actors should be clearly documented.

will revisit this later, after add integration tests

could we create an issue so that this doesn't slip through?

Signed-off-by: abrar <abrar@anyscale.com>

…tion-v2

jeffreywang88

implementation makes a lot of sense to me! i think we're just missing some tests:

DeploymentActorContainer unit tests (add / get / pop / count / get_wrapper)
ActorReplicaWrapper unit tests
any integration tests (running in a ray cluster) that we plan to add?

jeffreywang88 · 2026-03-12T03:34:28Z

+            )
+            if merged_runtime_env:
+                actor_options["runtime_env"] = merged_runtime_env
+            actor_options["max_restarts"] = -1


could we create an issue so that this doesn't slip through?

Signed-off-by: abrar <abrar@anyscale.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

jeffreywang88

lgtm, leaving a nit

Signed-off-by: abrar <abrar@anyscale.com>

…reation (ray-project#61664) Introduces lifecycle management for deployment-scoped actors and defers replica creation until those actors are ready. ### Changes - **Deployment-scoped actor lifecycle**: Adds `DeploymentActorWrapper` and `DeploymentActorContainer` to manage deployment actors (start, readiness checks, stop) with `STARTING` / `RUNNING` states. - **Deferred replica creation**: When `deployment_actors` is configured, replicas are created only after all deployment actors are ready. This avoids starting replicas before shared actors (e.g., model caches, state stores) are available. - **Recovery**: On controller restart, existing deployment actors are recovered via `_recover_deployment_actors()` instead of recreating them. - **Status handling**: Deployment actor startup failures are surfaced via `DeploymentStatus.DEPLOY_FAILED` with `DEPLOYMENT_ACTOR_FAILED` as the trigger. --------- Signed-off-by: abrar <abrar@anyscale.com>

[Serve] Deployment-scoped actor lifecycle and deferred replica creation

fdadea3

Signed-off-by: abrar <abrar@anyscale.com>

abrarsheikh requested a review from a team as a code owner March 11, 2026 22:38

abrarsheikh added the go add ONLY when ready to merge, run all tests label Mar 11, 2026

gemini-code-assist Bot reviewed Mar 11, 2026

View reviewed changes

cursor Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread python/ray/serve/_private/deployment_state.py

Comment thread python/ray/serve/_private/deployment_state.py Outdated

edge cases

7a6ad59

Signed-off-by: abrar <abrar@anyscale.com>

abrarsheikh requested a review from jeffreywang88 March 12, 2026 01:23

cursor Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread python/ray/serve/_private/deployment_state.py Outdated

ray-gardener Bot added the serve Ray Serve Related Issue label Mar 12, 2026

Merge branch 'master' of github.com:ray-project/ray into pr-2a-activa…

20c5514

…tion-v2

cursor Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread python/ray/serve/_private/deployment_state.py

jeffreywang88 reviewed Mar 12, 2026

View reviewed changes

add more test

a8e2ffd

Signed-off-by: abrar <abrar@anyscale.com>

cursor Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread python/ray/serve/_private/common.py

jeffreywang88 approved these changes Mar 12, 2026

View reviewed changes

Comment thread python/ray/serve/_private/test_utils.py Outdated

handle transition

8510bef

Signed-off-by: abrar <abrar@anyscale.com>

abrarsheikh requested a review from akyang-anyscale March 16, 2026 19:13

abrarsheikh changed the title ~~[Serve] Deployment-scoped actor lifecycle and deferred replica creation~~ Mar 18, 2026

akyang-anyscale approved these changes Mar 19, 2026

View reviewed changes

abrarsheikh merged commit 20eae5b into master Mar 19, 2026
6 checks passed

abrarsheikh deleted the pr-2a-activation-v2 branch March 19, 2026 06:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Serve][3/n] Deployment-scoped actor lifecycle and deferred replica creation#61664

[Serve][3/n] Deployment-scoped actor lifecycle and deferred replica creation#61664
abrarsheikh merged 5 commits into
masterfrom
pr-2a-activation-v2

abrarsheikh commented Mar 11, 2026 •

edited

Loading

gemini-code-assist Bot left a comment

gemini-code-assist Bot Mar 11, 2026

abrarsheikh Mar 12, 2026

jeffreywang88 Mar 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffreywang88 left a comment

Uh oh!

Uh oh!

Uh oh!

jeffreywang88 Mar 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

jeffreywang88 left a comment

Uh oh!

Uh oh!

Labels

3 participants

Uh oh!

Conversation

abrarsheikh commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

gemini-code-assist Bot Mar 11, 2026

Choose a reason for hiding this comment

abrarsheikh Mar 12, 2026

Choose a reason for hiding this comment

jeffreywang88 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffreywang88 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jeffreywang88 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

jeffreywang88 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Labels

3 participants

abrarsheikh commented Mar 11, 2026 •

edited

Loading