[Serve][LLM] Wait for request router init in LLMRouter constructor#63206
Conversation
LLMRouter.__init__ called handle._init() then immediately handle._get_request_router(), but the underlying RequestRouter is lazy-created on the first DEPLOYMENT_CONFIG long-poll, so the constructor was racing the controller and reliably failing replica startup with `_get_request_router() returned None after _init()`. Poll for `_get_request_router()` to become non-None (60s timeout) before completing __init__. Mirrors the existing pattern in `utils/broadcast.py`, which calls out the same race for the same reason: callers reading the request router directly bypass `assign_request`, which is where the `_request_router_initialized` event is normally awaited. Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request implements a lazy initialization for the RequestRouter using an asynchronous polling mechanism to wait for controller configuration. The review feedback recommends initializing self._request_router to None at the start of the constructor to avoid a potential AttributeError if check_health is called while the initialization is suspended.
| @@ -60,21 +69,36 @@ async def __init__(self, server: DeploymentHandle): | |||
| self._cached_endpoints: List[Tuple[str, int, str]] = [] | |||
| self._handle: DeploymentHandle = server | |||
There was a problem hiding this comment.
Initialize self._request_router to None before the await call. Since __init__ is an async method, it will be suspended at the await point on line 82. If check_health is invoked by the Serve replica controller while the constructor is suspended, it would raise an AttributeError because the attribute has not yet been defined on the instance. Explicitly initializing it to None ensures that check_health (line 101) can safely access the attribute and raise the intended RuntimeError instead.
| self._handle: DeploymentHandle = server | |
| self._handle: DeploymentHandle = server | |
| self._request_router = None |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 2eb18cd. Configure here.
|
|
||
| async def check_health(self): | ||
| if self._handle._get_request_router() is None: | ||
| if self._request_router is None: |
There was a problem hiding this comment.
Health check is dead code after successful initialization
Low Severity
check_health checks if self._request_router is None, but self._request_router is only ever assigned the return value of _wait_for_request_router, which guarantees a non-None result (it raises on timeout). Since Serve only calls check_health on successfully initialized replicas, this condition can never be True, making the method a no-op. The previous code dynamically queried self._handle._get_request_router(), which could theoretically detect if the router became unavailable post-init. The cached check provides no actual health signal.
Reviewed by Cursor Bugbot for commit 2eb18cd. Configure here.
this is an application bug that slipped through from #63167 |
|
Fixes LGTM. |
…uctor (ray-project#63206) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>


Why this is needed
After #63167, LLMRouter replicas started crashing on startup with:
LLMRouter.__init__calledhandle._init()then immediatelyhandle._get_request_router()— but the underlyingRequestRouteris lazy-created on the firstDEPLOYMENT_CONFIGlong-poll from the controller (seeAsyncioRouter.request_routerproperty inserve/_private/router.py). For a brief window after_init()returns,_get_request_router()is stillNone, so the constructor was racing the controller and reliably failing replica startup.The cleanup commits in #63167 dropped the previous
await self._handle.llm_config.remote()call that masked this race, becauseassign_requestawaits_request_router_initializedinternally — but reading_get_request_router()directly bypasses that synchronization point.Summary
LLMRouter.__init__now polls_get_request_router()until it returns non-None(60s timeout, fails the constructor and lets Serve retry on timeout).self._request_routerfor the hot path;check_healthnow consults the cached field.python/ray/llm/_internal/serve/utils/broadcast.py:67-90, which already documents the same race for callers that bypassassign_request.Test plan
pre-commit run --files python/ray/llm/_internal/serve/core/ingress/router.pytest_router.pyinstantiate viaLLMRouter.__new__(LLMRouter)and pre-set_request_router, so they're unaffected by the constructor change.golabel).🤖 Generated with Claude Code