Skip to content

[serve] Ensure deployment converges to healthy state#61818

Merged
abrarsheikh merged 2 commits into
masterfrom
gang-flaky-test-1
Mar 18, 2026
Merged

[serve] Ensure deployment converges to healthy state#61818
abrarsheikh merged 2 commits into
masterfrom
gang-flaky-test-1

Conversation

@jeffreywang88

Copy link
Copy Markdown
Contributor

Description

Deployments can be in DEPLOYING status while serving traffic, breaking the existing assertion that the deployment must be in healthy state. Instead, we check whether the deployment reaches a healthy state in wait_for_condition to avoid such race situation.

Related issues

Fixed postmerge flaky test: https://buildkite.com/ray-project/postmerge/builds/16498#019cfcac-67e1-4235-91ef-e1e3eb8450b3/L1058.

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang88 jeffreywang88 requested a review from a team as a code owner March 18, 2026 00:55
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Mar 18, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a flaky test by ensuring deployments reach a healthy state within a wait_for_condition loop, which is a robust way to handle race conditions in tests. The implementation correctly moves the status check into the waiting logic. I have one minor suggestion to improve the conciseness of the new code.

Comment on lines +366 to +368
for dep_status in app_status.deployments.values():
if dep_status.status != DeploymentStatus.HEALTHY:
return False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For conciseness, you can use any() with a generator expression to check for any unhealthy deployments. This can make the intent slightly clearer.

Suggested change
for dep_status in app_status.deployments.values():
if dep_status.status != DeploymentStatus.HEALTHY:
return False
if any(
dep_status.status != DeploymentStatus.HEALTHY
for dep_status in app_status.deployments.values()
):
return False
@ray-gardener ray-gardener Bot added the serve Ray Serve Related Issue label Mar 18, 2026
@abrarsheikh abrarsheikh merged commit 6fe7402 into master Mar 18, 2026
6 checks passed
@abrarsheikh abrarsheikh deleted the gang-flaky-test-1 branch March 18, 2026 16:37
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Mar 25, 2026
## Description
Deployments can be in `DEPLOYING` status while serving traffic, breaking
the existing assertion that the deployment must be in healthy state.
Instead, we check whether the deployment reaches a healthy state in
`wait_for_condition` to avoid such race situation.

## Related issues
Fixed postmerge flaky test:
https://buildkite.com/ray-project/postmerge/builds/16498#019cfcac-67e1-4235-91ef-e1e3eb8450b3/L1058.

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
## Description
Deployments can be in `DEPLOYING` status while serving traffic, breaking
the existing assertion that the deployment must be in healthy state.
Instead, we check whether the deployment reaches a healthy state in
`wait_for_condition` to avoid such race situation.

## Related issues
Fixed postmerge flaky test:
https://buildkite.com/ray-project/postmerge/builds/16498#019cfcac-67e1-4235-91ef-e1e3eb8450b3/L1058.

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

2 participants