Skip to content

[serve] Enable splice in haproxy by default#63531

Merged
kouroshHakha merged 6 commits into
ray-project:masterfrom
akyang-anyscale:alexyang/hap-splice
May 20, 2026
Merged

[serve] Enable splice in haproxy by default#63531
kouroshHakha merged 6 commits into
ray-project:masterfrom
akyang-anyscale:alexyang/hap-splice

Conversation

@akyang-anyscale

@akyang-anyscale akyang-anyscale commented May 19, 2026

Copy link
Copy Markdown
Contributor

HAProxy splice moves bytes from recv socket to send socket directly, without needing to copy into user memory. This helps improves performance for large payloads. Splice does not happen if the request body needs to be inspected or modified (e.g. in the case of wait-for-body)

Also add an env var to tune bufsize. Also reduce the logging sinks from two to one.

Perf results:

metric baseline haproxy delta
http_p50_latency 0.769 0.795 +3.298%
http_p90_latency 0.805 0.839 +4.266%
http_p95_latency 0.841 0.859 +2.217%
http_p99_latency 0.952 0.937 -1.552%
http_1mb_p50_latency 0.852 0.910 +6.796%
http_1mb_p90_latency 0.880 0.966 +9.728%
http_1mb_p95_latency 0.899 0.993 +10.426%
http_1mb_p99_latency 1.190 1.076 -9.572%
http_10mb_p50_latency 2.837 2.938 +3.569%
http_10mb_p90_latency 2.931 3.032 +3.472%
http_10mb_p95_latency 2.962 3.059 +3.309%
http_10mb_p99_latency 3.043 3.222 +5.868%
http_avg_rps 2859.290 2885.820 +0.928%
http_model_comp_avg_rps 1443.370 1435.760 -0.527%
http_100_max_ongoing_requests_avg_rps 2827.050 2876.540 +1.751%
http_model_comp_100_max_ongoing_requests_avg_rps 1572.530 1590.570 +1.147%
http_800_max_ongoing_requests_avg_rps 2230.380 2182.530 -2.145%
http_model_comp_800_max_ongoing_requests_avg_rps 1314.380 1283.730 -2.332%
http_streaming_avg_tps 47475.070 46681.030 -1.673%
http_streaming_p50_latency 10468.201 10623.782 +1.486%
http_streaming_p90_latency 10559.427 10845.366 +2.708%
http_streaming_p95_latency 10571.892 10864.102 +2.764%
http_streaming_p99_latency 10594.478 10909.300 +2.972%
http_intermediate_streaming_avg_tps 12959.350 12913.040 -0.357%
http_intermediate_streaming_p50_latency 11537.182 11558.856 +0.188%
http_intermediate_streaming_p90_latency 11714.388 11707.392 -0.060%
http_intermediate_streaming_p95_latency 11842.027 11715.867 -1.065%
http_intermediate_streaming_p99_latency 11848.345 11776.458 -0.607%
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the HAProxy configuration for Ray Serve by replacing the syslog port with a flexible log target and introducing a configurable buffer size (tune.bufsize). It also enables splice-request and splice-response options in the HAProxy template. Feedback suggests using get_env_str for the log target to maintain consistency and get_env_int_positive for the buffer size to prevent invalid non-positive values.

Comment thread python/ray/serve/_private/constants.py Outdated
Comment thread python/ray/serve/_private/constants.py
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
@akyang-anyscale akyang-anyscale added the go add ONLY when ready to merge, run all tests label May 20, 2026
@akyang-anyscale akyang-anyscale marked this pull request as ready for review May 20, 2026 03:28
@akyang-anyscale akyang-anyscale requested a review from a team as a code owner May 20, 2026 03:28
Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
@ray-gardener ray-gardener Bot added the serve Ray Serve Related Issue label May 20, 2026

@kouroshHakha kouroshHakha left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need truncation now that we use splice to do zero copy?

@kouroshHakha kouroshHakha merged commit d08e9e7 into ray-project:master May 20, 2026
7 checks passed
@akyang-anyscale

Copy link
Copy Markdown
Contributor Author

Do we need truncation now that we use splice to do zero copy?

still needed when body forwarding is used

harshit-anyscale added a commit to harshit-anyscale/ray that referenced this pull request May 21, 2026
Resolves conflicts from PR ray-project#63531 (splice + log consolidation):
- haproxy_templates.py: collapse to single `log {{ config.log_target }}`
  per master; layer optional stderr access-log mirror on top. Keep both
  retry/redispatch (HEAD) and splice-request/splice-response (master)
  in the defaults block.
- haproxy.py: union of imports (LOG_ACCESS_TO_STDERR + new INGRESS_*
  knobs + LOG_TARGET).
- test_haproxy_api.py: expected_config reflects merged template ordering
  (retry block then splice block).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

3 participants