[Serve] Fix 499 misclassification when client disconnects after successful response#61396
Merged
Conversation
…ssful response Signed-off-by: abrar <abrar@anyscale.com>
Contributor
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a race condition where successful requests were misclassified as 499 errors when the client disconnected immediately after receiving the response. The root cause, related to the handling of the optional more_body field in the ASGI spec, is fixed by changing the condition to correctly default to False when the field is omitted. The addition of a check for http.response.body message type makes the logic more robust. A new test case has been added that effectively reproduces the race condition using concurrent requests, ensuring the fix is well-tested and preventing future regressions. The changes are clear, correct, and well-implemented.
kamil-kaczmarek
pushed a commit
that referenced
this pull request
Mar 3, 2026
…ssful response (#61396) When the client closed the connection right after receiving the full response, Serve sometimes logged **499 (Client Closed Request)** even though the response was complete. This showed up in metrics (e.g. `ray_serve_num_http_error_requests_total`) and logs. ### Root cause In direct ingress ASGI handling, `response_finished` was set with: ```python if msg.get("more_body") is False: ``` Per the ASGI spec, `more_body` is optional and defaults to `False` when omitted. Starlette/FastAPI and Serve’s `convert_object_to_asgi_messages` omit it on the final body chunk. In that case `msg.get("more_body")` is `None`, so `None is False` is `False` and `response_finished` was never set. When the client disconnected right after the response, `receive_task` finished first. Because `response_finished` stayed `False`, the code treated it as an abort and logged 499. ## Before ```bash [2026-03-01 20:16:42,874] ip-10-0-77-90/INFO/locust.main: Shutting down (exit code 0) Type Name # reqs # fails | Avg Min Max Med | req/s failures/s --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- GET / 15152 0(0.00%) | 1907 61 2126 2000 | 97.73 0.00 --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- Aggregated 15152 0(0.00%) | 1907 61 2126 2000 | 97.73 0.00 ``` ```bash (ServeReplica:default:Model pid=14567) INFO 2026-03-01 20:16:42,607 default_Model 15macvxy 9b27b091-30aa-492d-b5c1-6cd334608da7 -- GET / 499 2041.4ms (ServeReplica:default:Model pid=14567) INFO 2026-03-01 20:16:42,618 default_Model 15macvxy 3d9c22ca-4569-456f-a79e-b8a14a757911 -- GET / 499 2041.4ms (ServeReplica:default:Model pid=14567) INFO 2026-03-01 20:16:42,628 default_Model 15macvxy 10b3012e-41ef-4cc0-87b2-f169a0622ff7 -- GET / 499 2041.1ms (ServeReplica:default:Model pid=14567) INFO 2026-03-01 20:16:42,638 default_Model 15macvxy a6cbb63e-6f7d-4209-88cd-d0456567b4df -- GET / 200 2041.5ms (ServeReplica:default:Model pid=14567) INFO 2026-03-01 20:16:42,648 default_Model 15macvxy 3c2b9346-c94c-4fc7-942a-723d9ee4c040 -- GET / 499 2041.5ms (ServeReplica:default:Model pid=14567) INFO 2026-03-01 20:16:42,658 default_Model 15macvxy af31dfaf-7d5d-4692-80ca-5e5b18304312 -- GET / 499 2041.4ms ``` <img width="2688" height="302" alt="image" src="https://github.com/user-attachments/assets/58875091-80df-4b98-b0e6-184e878f9a66" /> ## After ```bash [2026-03-01 20:27:22,287] ip-10-0-77-90/INFO/locust.main: Shutting down (exit code 0) Type Name # reqs # fails | Avg Min Max Med | req/s failures/s --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- GET / 33164 0(0.00%) | 1979 16 2070 2000 | 97.88 0.00 --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- Aggregated 33164 0(0.00%) | 1979 16 2070 2000 | 97.88 0.00 ``` ```bash (ServeReplica:default:Model pid=29389) INFO 2026-03-01 20:27:19,875 default_Model 26ok3pcv a1b1fa79-7326-4671-ac47-faa82b50c694 -- GET / 200 2061.5ms (ServeReplica:default:Model pid=29389) INFO 2026-03-01 20:27:19,875 default_Model 26ok3pcv e77b1e4f-bbb9-4cb2-86df-43fd3e077a25 -- GET / 200 2051.3ms (ServeReplica:default:Model pid=29389) INFO 2026-03-01 20:27:19,885 default_Model 26ok3pcv 9be3b7ab-9ad0-4b86-b76d-aa79e6d893a4 -- GET / 200 2051.0ms (ServeReplica:default:Model pid=29389) INFO 2026-03-01 20:27:19,925 default_Model 26ok3pcv 6c024662-f5f0-42ea-9bc7-74602c456917 -- GET / 200 2081.7ms (ServeReplica:default:Model pid=29389) INFO 2026-03-01 20:27:19,926 default_Model 26ok3pcv cd208d85-5293-496b-bf4c-7c8e169f0be3 -- GET / 200 2071.5ms (ServeReplica:default:Model pid=29389) INFO 2026-03-01 20:27:19,926 default_Model 26ok3pcv 3c2335fa-586b-437e-8d1d-cb3223a71c35 -- GET / 200 2061.8ms (ServeReplica:default:Model pid=29389) INFO 2026-03-01 20:27:19,926 default_Model 26ok3pcv e03e34c2-c794-44e8-a9c9-e0e6c437db65 -- GET / 200 2051.6ms (ServeReplica:default:Model pid=29389) INFO 2026-03-01 20:27:19,935 default_Model 26ok3pcv 3815b56b-9603-44e1-8c82-648b444688d7 -- GET / 200 2049.7ms ``` <img width="2691" height="312" alt="image" src="https://github.com/user-attachments/assets/f3c07d01-db5a-4080-8559-032063ac85d4" /> Signed-off-by: abrar <abrar@anyscale.com> Signed-off-by: Kamil Kaczmarek <kamil@anyscale.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When the client closed the connection right after receiving the full response, Serve sometimes logged 499 (Client Closed Request) even though the response was complete. This showed up in metrics (e.g.
ray_serve_num_http_error_requests_total) and logs.Root cause
In direct ingress ASGI handling,
response_finishedwas set with:Per the ASGI spec,
more_bodyis optional and defaults toFalsewhen omitted. Starlette/FastAPI and Serve’sconvert_object_to_asgi_messagesomit it on the final body chunk. In that casemsg.get("more_body")isNone, soNone is FalseisFalseandresponse_finishedwas never set.When the client disconnected right after the response,
receive_taskfinished first. Becauseresponse_finishedstayedFalse, the code treated it as an abort and logged 499.Before
After