Skip to content

[Serve][Bugfix] Fail loud when ingress request router dispatch fails#63215

Merged
kouroshHakha merged 3 commits into
ray-project:masterfrom
kouroshHakha:ingress-bypass/fail-loud-regression-fix
May 8, 2026
Merged

[Serve][Bugfix] Fail loud when ingress request router dispatch fails#63215
kouroshHakha merged 3 commits into
ray-project:masterfrom
kouroshHakha:ingress-bypass/fail-loud-regression-fix

Conversation

@kouroshHakha

Copy link
Copy Markdown
Contributor

Summary

Two related fixes for HAProxy + ingress-request-router peer deployments (e.g. LLMRouter):

  1. _direct_ingress_asgi TypeError (replica.py) — the route-prefix check did route.startswith(self._route_prefix), but ingress-request-router peer deployments have _route_prefix=None, so this TypeError'd into a uvicorn 500. Treat missing prefix as "/".

  2. Silent bypass of router policy (Lua + HAProxy template) — when Lua dispatch failed (router unreachable / non-200 / unparseable reply / unknown replica), HAProxy fell through to the primary backend, invisibly bypassing the operator's router. Failure paths now arm txn.ingress_request_router_failed = "<reason>" and the frontend returns 503 with X-Serve-Reason: <reason>, ordered before any use_backend so the request cannot reach the primary backend.

Product invariant: a request to a router-bearing app is served via the router or it fails. There is no quiet alternative path. Failure-mode bucketing belongs in observability, not the data plane.

Test plan

  • test_router_failure_503_rule_appears_before_use_backend — template ordering invariant
  • test_router_failure_fails_loud_with_reason — HAProxy against a stub broken router; asserts 503 + X-Serve-Reason, primary backend cumulative stot == 0
  • Manually verified end-to-end: real HAProxy + fake broken router + fake replica → 503 with X-Serve-Reason: router_non_200, both backends stot=0

🤖 Generated with Claude Code

Two related fixes for HAProxy + ingress-request-router peer deployments
(e.g. LLMRouter):

1. ``_direct_ingress_asgi`` was calling ``route.startswith(self._route_prefix)``
   with ``_route_prefix=None`` for ingress-request-router peer deployments
   (their route prefix is None), raising TypeError and surfacing as uvicorn
   500. Treat a missing prefix as ``/``.

2. When the Lua dispatch failed (router unreachable / non-200 / unparseable
   reply / unknown replica), HAProxy was silently falling through to the
   primary backend, invisibly bypassing the operator's router policy. Arm
   ``txn.ingress_request_router_failed = "<reason>"`` in those paths and
   add a frontend ``http-request return 503`` (with ``X-Serve-Reason``)
   ordered before any ``use_backend`` so a failed dispatch cannot reach the
   primary backend.

Tests:
- ``test_router_failure_503_rule_appears_before_use_backend`` pins the
  template-ordering invariant.
- ``test_router_failure_fails_loud_with_reason`` runs HAProxy against a
  stub broken router; asserts 503 + ``X-Serve-Reason``, and primary
  backend cumulative ``stot == 0``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kouroshHakha kouroshHakha requested a review from a team as a code owner May 8, 2026 02:44

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces 'fail-loud' behavior for the ingress request router, ensuring that routing failures result in a 503 error with a descriptive X-Serve-Reason header rather than silently falling back to the primary backend. This is achieved through updates to HAProxy templates, Lua routing logic, and replica path matching. Feedback suggests using an empty string as the default route prefix in replica.py to more reliably match all incoming paths.

Comment thread python/ray/serve/_private/replica.py Outdated
@kouroshHakha kouroshHakha added the go add ONLY when ready to merge, run all tests label May 8, 2026

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 3e7924c. Configure here.

Comment thread python/ray/serve/_private/ingress_request_router.lua.tmpl Outdated
kouroshHakha and others added 2 commits May 7, 2026 19:50
…ing prefix

Address Gemini's review comment: ``route.startswith("")`` is always true,
which correctly handles the empty-path ASGI edge case where ``"".startswith("/")``
would 404 the request. No behavior change for any non-empty path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two issues caught in code review:

1. The empty-body early return in the Lua action was leaving
   ``txn.ingress_request_router_failed`` unset, so a POST with an empty
   body to a router-bearing app silently fell through to the primary
   backend — the exact silent bypass this change is meant to close. Arm
   the sentinel with ``empty_body`` so it surfaces as 503 +
   ``X-Serve-Reason``. Promote the failure-semantics comment to a
   top-level note describing the contract for the whole action; the
   "past this point" framing no longer applies because every router
   decision now arms the sentinel on failure.

2. ``test_router_failure_503_rule_appears_before_use_backend`` searched
   for the substring ``use_backend``, but the rendered template includes
   that string inside an explanatory comment block ("static use_backend
   selection below") that renders before the 503 rule. The assertion
   would fail deterministically. Switch to line-based parsing that
   matches actual ``use_backend <name>`` directive lines.

Extends the integration test to also cover the empty-body path:
verified manually that an empty-body POST returns 503 with
``X-Serve-Reason: empty_body`` and primary backend stot stays 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kouroshHakha kouroshHakha merged commit 2fa47df into ray-project:master May 8, 2026
6 checks passed
kouroshHakha added a commit to eicherseiji/ray that referenced this pull request May 8, 2026
Brings in ray-project#63215 (fail-loud) and resolves the conflict in
``ingress_request_router.lua.tmpl`` against this branch's FORWARD_BODY
gate.

Resolution: drop the ``empty_body`` sentinel that ray-project#63215 added. With
the FORWARD_BODY gate, an empty body is no longer a distinct failure
mode -- both modes call the router with ``body=""`` and let the router
decide. The "router or fail" invariant still holds because the router
*is* called; we just don't synthesize a sentinel locally.

The 4 remaining failure-mode sentinels from ray-project#63215
(``router_unreachable`` / ``router_non_200`` /
``unparseable_replica_id`` / ``unknown_replica_id``) are unchanged.

``test_router_failure_fails_loud_with_reason`` updates fall out of the
resolution: both empty and non-empty POSTs now surface as
``router_non_200`` against the broken-router stub. The empty-body case
stays in the test to pin that empty bodies are routed through the
router rather than silently bypassing it.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
…ay-project#63215)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

2 participants