Skip to content

[serve][llm] Add telemetry for direct streaming feature#63779

Merged
eicherseiji merged 7 commits into
ray-project:masterfrom
eicherseiji:seiji/llm-direct-streaming-telemetry
Jun 3, 2026
Merged

[serve][llm] Add telemetry for direct streaming feature#63779
eicherseiji merged 7 commits into
ray-project:masterfrom
eicherseiji:seiji/llm-direct-streaming-telemetry

Conversation

@eicherseiji

@eicherseiji eicherseiji commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Why are these changes needed?

Adds a LLM_SERVE_DIRECT_STREAMING_ENABLED usage tag so we can track adoption of LLM direct streaming (RAY_SERVE_LLM_ENABLE_DIRECT_STREAMING).

Recorded inline in LLMServer._start_engine (gated on the env var), next to the existing per-model telemetry push. DPServer/PDDecodeServer inherit _start_engine, so the OpenAI, DP, and PD patterns are all covered. Recording replica-side (vs at build time) matches every other usage tag and guarantees GCS is available.

Checks

  • Signed off with DCO.
Record an LLM_SERVE_DIRECT_STREAMING_ENABLED usage tag when an app is built
with RAY_SERVE_LLM_ENABLE_DIRECT_STREAMING. The tag is recorded once at app
build time in _build_direct_streaming_llm_deployment, the single chokepoint
shared by the OpenAI, data-parallel, and prefill/decode builders, so all
direct-streaming serving patterns are covered.

Direct streaming is an app-level opt-in rather than a per-model property, so
it is recorded directly via record_extra_usage_tag instead of going through
the per-model TelemetryAgent.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces telemetry tracking for whether LLM direct streaming (engine-native ASGI ingress) is enabled. It adds the LLM_SERVE_DIRECT_STREAMING_ENABLED tag to the usage protobuf and registers it during the application build process. The review feedback suggests catching a broader Exception instead of only ValueError when recording the telemetry tag to ensure that unexpected telemetry errors do not crash the deployment or build process.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread python/ray/llm/_internal/serve/observability/usage_telemetry/usage.py Outdated
record_extra_usage_tag is already best-effort (no-ops before ray init, swallows
GCS write errors internally), so the only thing that can escape into this code
is TagKey.Value() raising on a not-yet-regenerated usage proto. Keep the catch
narrow to that ValueError so genuine bugs in the recording call still surface,
and log the benign skip at debug instead of swallowing silently. Responds to
review feedback on ray-project#63779.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Move the LLM_SERVE_DIRECT_STREAMING_ENABLED record call out of the builder and
into LLMServer._start_engine, gated on RAY_SERVE_LLM_ENABLE_DIRECT_STREAMING,
next to the existing per-model telemetry push.

Usage tags are last-write-wins state snapshots, so build-time vs replica-time
report the same value; recording on the replica matches every other usage tag
(core Serve and LLM), guarantees GCS is available (build-only paths like
serve build would silently no-op), and reflects direct streaming actually
running rather than merely configured. DPServer/PDDecodeServer inherit
_start_engine, so all three serving patterns stay covered.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Condense the docstring and inline comment to terse one/two-liners matching the
rest of the usage module; the detailed rationale lives in the commit history
and PR description.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Drop the try/except and the TelemetryTags indirection; reference the proto
TagKey member directly like ServeUsageTag.record, which removes the only
failure mode the catch guarded.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Jun 1, 2026
Drop the one-line record_direct_streaming_enabled helper and record the tag
directly in LLMServer._start_engine, matching core Serve's ServeUsageTag
one-liner style. Removes the now-orphaned helper and its unit test.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
The tag is a cluster-wide signal: written per replica on engine start but
last-write-wins, so it reports one value per cluster.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji eicherseiji marked this pull request as ready for review June 1, 2026 22:50
@eicherseiji eicherseiji requested review from a team, pcmoritz and thomasdesr as code owners June 1, 2026 22:50

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit f215435. Configure here.

LLM_SERVE_NUM_GPUS = 613;
// Whether LLM direct streaming (engine-native ASGI ingress) is enabled via
// RAY_SERVE_LLM_ENABLE_DIRECT_STREAMING. "1" when enabled.
LLM_SERVE_DIRECT_STREAMING_ENABLED = 623;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proto file modified — review RPC fault-tolerance guide

Low Severity

⚠️ This PR modifies one or more .proto files.
Please review the RPC fault-tolerance & idempotency standards guide here:
https://github.com/ray-project/ray/tree/master/doc/source/ray-core/internals/rpc-fault-tolerance.rst

Fix in Cursor Fix in Web

Triggered by project rule: Bugbot Rules

Reviewed by Cursor Bugbot for commit f215435. Configure here.

@ray-gardener ray-gardener Bot added serve Ray Serve Related Issue observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Jun 2, 2026
@eicherseiji eicherseiji merged commit 66ac79d into ray-project:master Jun 3, 2026
8 checks passed
rueian pushed a commit to rueian/ray that referenced this pull request Jun 4, 2026
…63779)

## Why are these changes needed?

Adds a `LLM_SERVE_DIRECT_STREAMING_ENABLED` usage tag so we can track
adoption of LLM direct streaming
(`RAY_SERVE_LLM_ENABLE_DIRECT_STREAMING`).

Recorded inline in `LLMServer._start_engine` (gated on the env var),
next to the existing per-model telemetry push.
`DPServer`/`PDDecodeServer` inherit `_start_engine`, so the OpenAI, DP,
and PD patterns are all covered. Recording replica-side (vs at build
time) matches every other usage tag and guarantees GCS is available.

## Checks
- [x] Signed off with DCO.

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
limarkdcunha pushed a commit to limarkdcunha/ray that referenced this pull request Jun 30, 2026
…63779)

## Why are these changes needed?

Adds a `LLM_SERVE_DIRECT_STREAMING_ENABLED` usage tag so we can track
adoption of LLM direct streaming
(`RAY_SERVE_LLM_ENABLE_DIRECT_STREAMING`).

Recorded inline in `LLMServer._start_engine` (gated on the env var),
next to the existing per-model telemetry push.
`DPServer`/`PDDecodeServer` inherit `_start_engine`, so the OpenAI, DP,
and PD patterns are all covered. Recording replica-side (vs at build
time) matches every other usage tag and guarantees GCS is available.

## Checks
- [x] Signed off with DCO.

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling serve Ray Serve Related Issue

3 participants