Streaming generator doc#63791
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new documentation page explaining the internals of streaming generator tasks in Ray. The review feedback correctly identifies that the hardcoded Ray version 2.55 and the corresponding ray-2.55.0 tags in the GitHub URLs within the new documentation do not exist, which will result in broken links. These should be updated to a valid branch or tag like master.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
sampan-s-nayak
left a comment
There was a problem hiding this comment.
Very detailed writeup! thanks for taking the time to write this up!
|
|
||
| The following diagram shows the reporting path for the first yielded value: | ||
|
|
||
| ``` |
There was a problem hiding this comment.
we can consider using mermaid UML diagrams here (not sure if it is supported in our docs page though)
There was a problem hiding this comment.
I think it is not supported, unfortunately.
Signed-off-by: Rueian <rueiancsie@gmail.com>
Kunchd
left a comment
There was a problem hiding this comment.
Thanks for the helpful doc on streaming generator! I left some nits and questions to help me better understand it.
|
|
||
| 1. Insert the returned object into the caller-side `ObjectRefStream` at the yield index. | ||
| 2. Handle the reported return object using the same direct-return versus plasma-return logic as normal task returns. | ||
| 3. [Make the reported ObjectRef ready](https://github.com/ray-project/ray/blob/ray-2.55.0/src/ray/core_worker/task_manager.cc#L821-L847). If a caller is [waiting in next(gen)](https://github.com/ray-project/ray/blob/ray-2.55.0/python/ray/_private/object_ref_generator.py#L188-L237) for this in-order yield index, that wait can now finish. |
There was a problem hiding this comment.
Maybe I might've missed something when I was looking through this, but the return path was actually a little involved. The report generator item returns call doesn't actually send the object to the caller if the object was in plasma. Instead, a get call was needed. Do we want to clarify this here or is this too much detail.
There was a problem hiding this comment.
Change to Handle the reported return object as a direct return or a plasma return, using the same logic as normal task returns.. Hoping that is much clearer now.
|
|
||
| When the `ObjectRefGenerator` goes out of scope, its [destructor](https://github.com/ray-project/ray/blob/ray-2.55.0/python/ray/_private/object_ref_generator.py#L289-L295) asks the caller-side core worker to delete the `ObjectRefStream` for the ObjectID of the generator ObjectRef. | ||
| [Deleting the stream](https://github.com/ray-project/ray/blob/ray-2.55.0/src/ray/core_worker/task_manager.cc#L690-L727) releases unconsumed streamed refs from the caller-side stream and replies to pending backpressured report RPCs with `NotFound`. | ||
| This unblocks an executor that was waiting for the caller to consume more refs. |
There was a problem hiding this comment.
Does the backpressure mechanism simply stop checking if we're under the threshold if not found is returned?
Also, the caller is technically the owner for the generated object. If the object ref stream already gone out of scope, are the created object immediately marked for eviction/deleted?
There was a problem hiding this comment.
Does the backpressure mechanism simply stop checking if we're under the threshold if not found is returned?
It doesn't stop checking, but each NotFound will unblock it.
Also, the caller is technically the owner for the generated object. If the object ref stream already gone out of scope, are the created object immediately marked for eviction/deleted?
The ObjectRefStream is simply a caller struct that tracks stream slots. Even if it is out of scope, refs for the created objects can still be in scope, and then they won't be immediately marked for eviction.
| 2. The executor sends the final `PushTask` reply. This reply includes the generator ObjectRef and the [list of streamed return ObjectIDs](https://github.com/ray-project/ray/blob/ray-2.55.0/src/ray/core_worker/task_execution/task_receiver.cc#L54-L60) produced by the task. | ||
| 3. The caller handles the final `PushTask` reply, records how many streaming return objects the task produced, and marks the stream as ended. | ||
| 4. The caller [writes an internal end-of-stream marker](https://github.com/ray-project/ray/blob/ray-2.55.0/src/ray/core_worker/task_manager.cc#L1078-L1085) into the caller-side `ObjectRefStream`. | ||
| 5. Later, when `ObjectRefGenerator` reaches the marker, it checks the generator ObjectRef. |
There was a problem hiding this comment.
What if stop iteration is raised in the executor UDF prematurely?
There was a problem hiding this comment.
That will be treated as an application error. You will get a ref on the caller side, and when you ray.get on it, it will raise a ray.exceptions.RayTaskError wrapped with StopIteration.
Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
Signed-off-by: Rueian Huang <rueiancsie@gmail.com>
The doc that explains how streaming generators work. Preview link: https://anyscale-ray--63791.com.readthedocs.build/en/63791/ray-core/internals/streaming-generator.html --------- Signed-off-by: Rueian Huang <rueiancsie@gmail.com> Signed-off-by: Rueian <rueiancsie@gmail.com>
The doc that explains how streaming generators work. Preview link: https://anyscale-ray--63791.com.readthedocs.build/en/63791/ray-core/internals/streaming-generator.html --------- Signed-off-by: Rueian Huang <rueiancsie@gmail.com> Signed-off-by: Rueian <rueiancsie@gmail.com>
The doc that explains how streaming generators work.
Preview link: https://anyscale-ray--63791.com.readthedocs.build/en/63791/ray-core/internals/streaming-generator.html