[Serve] feat: scale down non matching primary labels replica first#61488
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a change to the deployment scale-down logic to prioritize stopping replicas on nodes that do not match the deployment's primary label_selector. This is achieved by modifying the sorting key in _get_replicas_to_stop. While the overall approach is correct, I found a critical issue in the implementation of the sorting logic that inverts the intended priority. My review includes a specific comment with a suggested fix for this issue.
abrarsheikh
left a comment
There was a problem hiding this comment.
like the approach overall, it's reasonable. Please add e2e tests.
There was a problem hiding this comment.
function has grown a bit complicated, please update the docstring explain the algorithm
| match_primary_labels = int( | ||
| node_labels_match_selector(node_labels, primary_labels) | ||
| ) |
There was a problem hiding this comment.
maybe this should also take into account placement group label selector.
There was a problem hiding this comment.
+1 I think we can check node_labels against self._deployments[deployment_id].placement_group_bundle_label_selector if it's set, label_selector is only for the deployment Actor while placement_group_bundle_label_selector is applied to each of the bundles that dictate where the replicas are placed. If the node labels don't match any bundles/bundle_label_selector in placement_group_bundle_label_selector, then it's infeasible for scheduling and should be scaled down first.
|
please also fix DCO |
| @@ -1089,6 +1090,8 @@ def _get_replicas_to_stop( | |||
| if len(replicas_to_stop) == max_num_to_stop: | |||
| return replicas_to_stop | |||
|
|
|||
| actor_options = self._deployments[deployment_id].actor_options or {} | |||
| primary_labels: Dict[str, str] = actor_options.get("label_selector", {}) | |||
There was a problem hiding this comment.
Can we do primary_label_selector or similar, I think an Actor can also have labels so it might get confusing.
722d7af to
aae4e32
Compare
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
aae4e32 to
62cab53
Compare
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
…caling_down_fallback_labels_replica
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
|
@abrarsheikh The PR is ready for review. Could you take a look? |
| assert dep_id in deployment_to_replicas_to_stop | ||
| return deployment_to_replicas_to_stop[dep_id] | ||
|
|
||
| def test_downscale_prioritizes_non_running_replicas(self, ray_cluster): |
There was a problem hiding this comment.
the current tests are wriiten as unit tests, if you see the rest of the tests in the file they are all written as integration tests. Let's maintain consistency with other tests.
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
…caling_down_fallback_labels_replica
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
jeffreywang88
left a comment
There was a problem hiding this comment.
great contribution! let's add some more tests:
- no tests for the
bundle_label_selectorpath. both integ tests uselabel_selector. - test for head node interaction: verify that the head node replicas are still deprioritized even when they're on a fallback node
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
@jeffreywang-anyscale We should prioritize fallback node condition over head node shouldn't we? I think it makes more sense to always scale down node which doesn't match the primary label selector regradless it's the head node or not |
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
…caling_down_fallback_labels_replica
The main goal of this PR is to free up spot instance resources. Head node hosts critical components like GCS, autoscaler, and Serve controller, so even though we evict replicas, we can't delete the underlying spot instance. Because of this, I think replicas on the head node should be deprioritized for eviction during scale-down. They are last in line, since removing them provides no benefit in terms of draining nodes. While Ray offers GCS fault tolerance via external Redis, allowing workers to continue serving traffic during head node recovery, the control plane is unavailable during this period (e.g. no deployments can be updated). Avoiding head node loss remains important for cluster stability. We can avoid scheduling head node on spot instances by configuring Let's get a second opinion, @abrarsheikh WDYT? |
|
Could you please fix premerge test failures? |
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
I agree with @jeffreywang-anyscale. The head node should be the last to go, behavior is consistent with what currently exists. Also, no serious user will run the head node of a spot instance, so this decision is moot. |
Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com>
|
@abrarsheikh @jeffreywang-anyscale Could you check again? I have updated code to make replicas on head node go down last |
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
jeffreywang88
left a comment
There was a problem hiding this comment.
thanks for the work! pushed a minor fix for the tests to ensure we're testing the intended priorities.
…ay-project#61488) > 💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete. ## Description > Briefly describe what this PR accomplishes and why it's needed. - Prioritze sclaing down replica which doesn't fit label_selector (fallback_strategy enabled) ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". - Related to ray-project#58959 ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Mạnh Lê Đức <naruto12308@gmail.com> Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Co-authored-by: Jeffrey Wang <jeffreywang@anyscale.com>

Fixes #58959