[Serve] Make rolling update percentage configurable (#54509)#62160
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable rolling_update_percentage for Ray Serve deployments, allowing users to control the number of replicas updated simultaneously during a rollout. The changes include updates to the deployment configuration, schema, and protobuf definitions, along with logic in the deployment state manager to respect this new setting. A suggestion was made to parameterize the new unit test to more robustly cover various scenarios, including default values, rounding behavior, and edge cases like 100% updates.
| @@ -219,6 +219,13 @@ class DeploymentConfig(BaseModel): | |||
| update_type=DeploymentOptionUpdateType.HeavyWeight, | |||
| ) | |||
|
|
|||
| rolling_update_percentage: float = Field( | |||
| default=0.2, | |||
There was a problem hiding this comment.
let's move the default value as a env var in constants. That will be a nice touch if any user wishes to control this globally.
There was a problem hiding this comment.
wouldn't it be hard to track if user have multiple clusters? For example lets say in one cluster if the DEFAULT env variable is 0.5 and then in another it is 0.2... I am open to discussion on this but as a service provider we should give the default which will be 0.2 but it is a tweakable feature via certain variables which we are providing in this PR [ Users will change this].
There was a problem hiding this comment.
One argument we can use against my above comment would be not hardcoding default value in code.
There was a problem hiding this comment.
env vars are scoped to a single cluster because they are generally part of the Docker image.
|
@abrarsheikh - feel free to merge once you get a chance 🙏 |
|
@jeffreywang-anyscale @harshit-anyscale @kouroshHakha - Can you guys please review this PR please |
kouroshHakha
left a comment
There was a problem hiding this comment.
The feature is well-scoped and the core change is correct. The plumbing across all four config paths (decorator, REST, YAML, proto) is complete.
A few non-blocking suggestions:
Missing DEFAULT_ROLLING_UPDATE_PERCENTAGE constant
All other tunable Serve defaults (DEFAULT_GRACEFUL_SHUTDOWN_TIMEOUT_S, DEFAULT_HEALTH_CHECK_PERIOD_S, etc.) live as named constants in constants.py. The 0.2 is currently inlined in config.py, the api.py docstring, and the schema.py description — three places that drift independently if the default changes. (See inline comment.)
Use optional double in proto
The serve proto already uses optional double for fields that need "not set" semantics (upscale_smoothing_factor, downscaling_factor, etc.). _proto_to_dict already handles these correctly — optional fields skip the zero-default-filling loop via the containing_oneof check, so an unset field simply never appears in data and DeploymentConfig naturally falls back to 0.2. This would let you remove the special-case in from_proto entirely. (See inline comment.)
Proto round-trip test missing (Important)
The from_proto path has the most upgrade-brittleness risk (old controller → new worker during a rolling cluster upgrade). A small unit test verifying that rolling_update_percentage=0.5 survives a to_proto() → from_proto() round-trip, and that an older proto (field absent) falls back to 0.2, would close this gap.
DeploymentConfig class docstring not updated
The class-level docstring (line 113–146 in config.py) lists all constructor args but rolling_update_percentage is missing.
Note
This review was co-written with AI assistance (Claude Code).
|
@suppagoddo any updates on this pr? |
|
@kouroshHakha - I was on vacation the last 1.5 weeks and will be reaching home today evening, will address all your comments and will try to get it merged. Thanks again for the review 🙏 |
|
Looking into this today, will submit the requested change by eod. |
Replace the hardcoded 20% rolling update percentage with a configurable `rolling_update_percentage` field on DeploymentConfig (default 0.2). Users can now control what fraction of replicas are updated at a time during rolling updates via @serve.deployment(), REST API, or config YAML. - Add `rolling_update_percentage` to DeploymentConfig (gt=0.0, le=1.0) - Add to DeploymentSchema for REST/YAML/decorator support - Add to serve.proto for Python API round-trip - Handle proto3 zero-default in from_proto - Wire through _deployment_info_to_schema for status API - Add test_rolling_update_percentage_configurable unit test - Include 20 in expected worker thread count range (test_basic.py) Signed-off-by: uditagrawal <uagrawal@apple.com>
…nd deployment_to_schema The field was added to DeploymentSchema but not passed through the explicit conversion functions in deployment.py, causing values set via REST API or config YAML to be silently dropped. Signed-off-by: uditagrawal <uagrawal@apple.com>
Cover default percentage, rounding, minimum-of-1, non-clean divisors, and 100% update scenarios. Signed-off-by: uditagrawal <uagrawal@apple.com>
Add the parameter to the decorator signature, docstring, and DeploymentConfig.from_default() call so users can configure it via @serve.deployment(rolling_update_percentage=0.5). Signed-off-by: uditagrawal <uagrawal@apple.com>
…ings Replace the hardcoded 0.2 that was duplicated across config.py, schema.py, and the api.py decorator docstring with a single named constant, DEFAULT_ROLLING_UPDATE_PERCENTAGE, in _private/constants.py. This matches the pattern used by DEFAULT_GRACEFUL_SHUTDOWN_TIMEOUT_S, DEFAULT_HEALTH_CHECK_PERIOD_S, etc., and prevents the three call sites from drifting independently. Also: - Add rolling_update_percentage to DeploymentConfig's class-level docstring (it was listed as a field but missing from the Args section). - Insert the missing blank line before the Returns: section of the @serve.deployment() docstring to match the Google-style convention used throughout api.py. Addresses review feedback on ray-project#62160. Signed-off-by: uditagrawal <uagrawal@apple.com>
Declare rolling_update_percentage as `optional double` in serve.proto so that "unset" (e.g. an older controller during a rolling upgrade that predates this field) is distinguishable from an explicit value, matching the pattern already used for upscale_smoothing_factor, downscaling_factor, etc. This lets us remove the special-case in DeploymentConfig.from_proto that treated proto3's zero-value default as "use the Python default". _proto_to_dict already skips synthetic-oneof-backed optional fields (via the `containing_oneof` check), so an unset field never enters `data` and Pydantic falls back to DEFAULT_ROLLING_UPDATE_PERCENTAGE on its own. Also add test_rolling_update_percentage_proto_roundtrip covering: - An explicit value (0.5) survives to_proto() -> from_proto() losslessly. - A proto without the field set (simulating an older controller) deserializes to DEFAULT_ROLLING_UPDATE_PERCENTAGE rather than 0.0. Addresses review feedback on ray-project#62160. Signed-off-by: uditagrawal <uagrawal@apple.com>
79b31df to
fe8c2d4
Compare
The @serve.deployment() docstring referenced
DEFAULT_ROLLING_UPDATE_PERCENTAGE via
:data:`~ray.serve._private.constants.DEFAULT_ROLLING_UPDATE_PERCENTAGE`
but `_private` modules are not indexed in Ray's published Sphinx API docs
and the docs build runs with nitpicky + warnings-as-errors, producing:
WARNING: py:data reference target not found:
ray.serve._private.constants.DEFAULT_ROLLING_UPDATE_PERCENTAGE
Replace the cross-reference with the literal default ("0.2 (20%)") so the
docstring still surfaces the value without creating a build-time warning.
The named constant is still referenced from code (config.py, schema.py)
so the "one place to change the default" guarantee is preserved.
Signed-off-by: uditagrawal <uagrawal@apple.com>
…e_proto_roundtrip The multi-line assert fits within black's line limit on a single line; black reformatted it as part of pre-commit. Apply the reformat here so the lint-roller pre-commit job passes. Signed-off-by: uditagrawal <uagrawal@apple.com>
…ng invalid proto The test previously built `DeploymentConfigProto(num_replicas=1)` directly to simulate "an older controller without this field". But that proto has every other field at its proto3 zero default (max_ongoing_requests=0, health_check_period_s=0.0, max_constructor_retry_count=0, ...), which then fails Pydantic validation when `from_proto` constructs the `DeploymentConfig` (fields require gt=0 / positive int / similar). Start from a valid `DeploymentConfig().to_proto()` (so all non-optional fields are populated with real defaults), then `ClearField` only the optional `rolling_update_percentage` to simulate the older-controller case. Clearing makes `HasField()` return False, which is exactly what a proto serialized before this field existed would present. Signed-off-by: uditagrawal <uagrawal@apple.com>
…cstring Match the reST inline-literal formatting already used for the interval ``(0.0, 1.0]`` earlier in the same docstring. Purely cosmetic; also triggers a fresh premerge run to retry a flaky docs build caused by an upstream HuggingFace intersphinx 504. Signed-off-by: uditagrawal <uagrawal@apple.com>
|
@kouroshHakha - Can you please review this again? Addressed all your recommended comments. |
… (ray-project#62160) Signed-off-by: uditagrawal <uagrawal@apple.com>

Replace the hardcoded 20% rolling update percentage with a configurable
rolling_update_percentagefield on DeploymentConfig (default 0.2).Users can now control what fraction of replicas are updated at a time during rolling updates via @serve.deployment(), REST API, or config YAML.
rolling_update_percentageto DeploymentConfig (gt=0.0, le=1.0)