[serve] Add ControllerOptions for configurable controller runtime_env#63352
Merged
kouroshHakha merged 4 commits intoMay 15, 2026
Merged
Conversation
Today the only way to influence the Serve controller actor's environment
is to set Ray cluster env vars at start time and hope they're on the
Anyscale runtime-env hook's propagation allowlist. Knobs like
RAY_SERVE_HAPROXY_NBTHREAD and RAY_SERVE_HAPROXY_TCP_NODELAY were
silently dropped, blocking experiments and operator overrides.
Add ControllerOptions, a public config object symmetric with HTTPOptions
and gRPCOptions, that carries a strictly-validated runtime_env for the
controller actor. v0 scope is intentionally narrow: only the env_vars
key under runtime_env is accepted. Other keys (pip, working_dir,
py_modules, container, ...) would mutate the detached, long-lived
controller's dependencies and are rejected with a message pointing
operators at deployment-level runtime_env.
Plumbed through:
- serve.start(controller_options=...)
- serve.run(..., controller_options=...) (and _run / _run_many / run_many)
- serve run foo.yaml via ServeDeploySchema.controller_options
- get_controller_impl() applies it to the controller actor's runtime_env
Reuses Anyscale's env_hook merge semantics: explicit runtime_env.env_vars
land additively on top of the hook's auto-injected set.
Same lifecycle as HTTPOptions: only applied on first controller creation;
ignored with a log warning if a controller is already running.
Tests:
- TestControllerOptions in tests/unit/test_config.py (12 methods,
parametrized -- 24 cases total) for the validator
- TestGetControllerImpl in the same file (3 cases) for white-box
wiring into the actor class
- 4 cases on TestServeDeploySchema in tests/unit/test_schema.py for
YAML-schema integration
- test_serve_start_controller_options (parametrized over model and
dict input) and test_serve_start_controller_options_rejects_disallowed_runtime_env
in tests/test_standalone.py for live env-propagation end-to-end
Verified end-to-end on a Ray Serve LLM + HAProxy stack: serve.start with
ControllerOptions(runtime_env={"env_vars": {"RAY_SERVE_HAPROXY_TCP_NODELAY": "1"}})
lands the env var on the controller's /proc/<pid>/environ and the
rendered haproxy.cfg picks up `option http-no-delay`. That flip cut
c=64 streaming TTFT mean from 201 ms to 98 ms, matching vllm-router
(104 ms) on the same workload.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces ControllerOptions to allow passing configuration, specifically runtime_env.env_vars, to the Ray Serve controller actor during initialization. This new configuration is integrated into serve.start, serve.run, the YAML deployment schema, and the CLI. Feedback includes a suggestion to use the built-in dict type for consistency in type hints and a recommendation to improve the validation logic for env_vars to correctly handle cases where the key might be explicitly set to null.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 010a8c6. Configure here.
- api.py: use ``dict`` instead of ``Dict`` in ``_coerce_controller_options`` signature to match the lowercase-``dict`` style used by neighboring Union annotations. - config.py: reject explicit ``env_vars: None`` (e.g., from YAML ``null``) by checking key presence with ``in`` instead of ``dict.get``, so a bad config fails locally with a ValidationError rather than crashing later in the Ray runtime_env layer. Added a regression test. - serve_head.py: forward ``config.controller_options`` from ``put_all_applications`` to ``serve_start_async`` -- previously the REST API path silently dropped controller_options from the request body even though schema validation accepted them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
Fixes ci/lint/lint.sh api_policy_check: every @publicapi in ray.serve must appear in doc/source/serve/api/index.md. ControllerOptions was added alongside HTTPOptions/gRPCOptions in the new commit but missed from the docs index. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
abrarsheikh
reviewed
May 15, 2026
Addresses review feedback: move the ControllerOptions import from the TYPE_CHECKING block at the bottom of the file to the regular imports at the top. ray.serve.config has no import dependency on ray.serve._private.default_impl, so a runtime import is safe and lets us also drop the string annotation on ``get_controller_impl``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
abrarsheikh
approved these changes
May 15, 2026
TruongQuangPhat
pushed a commit
to cyhapun/ray-fix-issue
that referenced
this pull request
May 27, 2026
…ray-project#63352) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: phattruong <23120318@student.hcmus.edu.vn>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
ControllerOptionsconfig object (alpha), symmetric withHTTPOptions/gRPCOptions, that letsserve.start()/serve.run()/serve run foo.yamlpass a strictly-validatedruntime_envinto the Serve controller actor.runtime_envonly, and withinruntime_envonly theenv_varskey is accepted. Other keys (pip,working_dir,py_modules,container, ...) would mutate Serve's own dependencies on a detached, long-lived controller actor and are rejected with a message pointing operators at deployment-levelruntime_env.Motivation
Today if there are env variables that need to be passed to the HAProxy layer (e.g.
RAY_SERVE_HAPROXY_TCP_NO_DELAY) we have to set them at cluster level, instead of doing something likeRAY_SERVE_HAPROXY_TCP_NO_DELAY=1 serve run foo.yamlor setting the env var in runtime envs in the yaml. which would be much more convenient for tuning.API
Validator catches typos and disallowed fields at parse time:
Test plan
python/ray/serve/tests/unit/test_config.py::TestControllerOptions— 12 methods (parametrized) covering accept/reject paths: defaultNone, dict coerce viamodel_validate, validenv_vars, emptyenv_vars, every non-env_varsruntime_envkey (pip,working_dir,py_modules,conda,container,nsight), non-dictruntime_env, non-dictenv_vars, non-strvalues across types, empty/non-string env-var keys, extra top-level fields, mixed allowed-and-disallowed keys.python/ray/serve/tests/unit/test_config.py::TestGetControllerImpl— 3 cases assertingruntime_envcorrectly threaded into the actor class's_default_options(or omitted when not requested).python/ray/serve/tests/unit/test_schema.py::TestServeDeploySchema— 4 cases on the YAML schema: default-None, valid passthrough, rejects disallowedruntime_envkeys, rejects non-str env values.python/ray/serve/tests/test_standalone.py::test_serve_start_controller_options— parametrized e2e (model + dict input) that asserts the requested env vars actually land on the live controller actor'sos.environ.python/ray/serve/tests/test_standalone.py::test_serve_start_controller_options_rejects_disallowed_runtime_env— verifies badruntime_envraisesValidationErrorat the caller, not from a Ray task.🤖 Generated with Claude Code