The following changes are planned for the next several Ray Serve releases. Please review and prepare your applications accordingly.
- Pydantic v1 support will be removed. Ray Serve will require Pydantic v2. If you are still on Pydantic v1, upgrade now with pip install -U pydantic. See #58876 for migration details.
- Sync deployment methods will run in a threadpool by default. Synchronous user code will be executed in a threadpool rather than blocking the event loop, improving concurrency for sync handlers. If your deployment relies on the current single-threaded behavior, you should set
max_ongoing_requests=1 on the deployment.
- We are replacing the current Ray Serve HTTP Proxy with a more optimized ingress layer. This change means that the ingress deployment will no longer support model multiplexing or custom request routing. While this new ingress layer will be opt-in for several Ray versions, we strongly advise moving any multiplexing logic or custom request routing policy to downstream deployments.
RAY_SERVE_RUN_USER_CODE_IN_SEPARATE_THREAD will default to 0. User code will run in the same thread as the replica event loop by default. If your deployment depends on user code running in a separate thread, explicitly set RAY_SERVE_RUN_USER_CODE_IN_SEPARATE_THREAD=1.
RAY_SERVE_RUN_ROUTER_IN_SEPARATE_LOOP will default to 0. The request router will run in the same event loop as the proxy/replica by default. If your setup requires a separate router loop, explicitly set RAY_SERVE_RUN_ROUTER_IN_SEPARATE_LOOP=1.
The following changes are planned for the next several Ray Serve releases. Please review and prepare your applications accordingly.
max_ongoing_requests=1on the deployment.RAY_SERVE_RUN_USER_CODE_IN_SEPARATE_THREADwill default to 0. User code will run in the same thread as the replica event loop by default. If your deployment depends on user code running in a separate thread, explicitly setRAY_SERVE_RUN_USER_CODE_IN_SEPARATE_THREAD=1.RAY_SERVE_RUN_ROUTER_IN_SEPARATE_LOOPwill default to 0. The request router will run in the same event loop as the proxy/replica by default. If your setup requires a separate router loop, explicitly setRAY_SERVE_RUN_ROUTER_IN_SEPARATE_LOOP=1.