[Data] Add polars usage instruction to docs#60029
Conversation
… section Signed-off-by: peterxcli <peterxcli@gmail.com>
f3b3875 to
5cbd0ef
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds documentation on how to enable Polars operations in Ray Data via DataContext. The added section is clear and helpful. I've found a minor issue in one sentence which contains a typo and a grammatical error. I've provided a suggestion to fix it for better clarity.
There was a problem hiding this comment.
Code Review
This pull request adds documentation on how to enable Polars-based operations in Ray Data, specifically for sorting, by setting use_polars_sort in the DataContext. The change is clear and useful. I've found a minor grammatical issue in the new documentation and suggested a correction to improve clarity.
|
@peterxcli please fix the suggestion from gemini, thanks |
|
And also, if we want to use glossary like "Polars", please use it with backtick like |
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Peter Lee <peterxcli@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
|
@owenowenisme please take another look. thanks! |
|
@owenowenisme Thanks for the review🙏 |
|
|
||
| If you encounter an OOM errors, try decreasing your ``batch_size``. | ||
|
|
||
| Enabling ``Polars`` operations |
There was a problem hiding this comment.
Nit: Here and elsewhere -- I think it makes more sense for Polars to not be code text, especially since it's not part of the glossary or anything.
| ctx = ray.data.DataContext.get_current() | ||
| ctx.use_polars_sort = True | ||
|
|
||
| When you enable this flag, Ray Data automatically uses ``Polars`` for tabular dataset sorting operations, which can significantly improve performance for certain workloads. This doesn't affect your UDF code, you can still use any batch format in :meth:`~ray.data.Dataset.map_batches`. |
There was a problem hiding this comment.
What're the user-facing Ray Data APIs that benefit from the polars feature?
IIUC it doesn't improve performance for most UDFs except for map_groups, and that's because of an implementation detail where we perform a sort.
Would this information be more appropriate in a different user guide(s)?
There was a problem hiding this comment.
I moved it to performance-tips, WDYT?
|
Hi @peterxcli, are you still working on this? |
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds documentation to the Ray Data “Transforming data” guide describing how to enable polars-backed optimizations via DataContext, in response to #59224.
Changes:
- Add a new “Enabling Polars operations” subsection explaining
DataContext.use_polars_sort. - Minor whitespace cleanup in existing
.. testcode::blocks.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2a239fa to
86082dc
Compare
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
## Description We can use polars to make operations more efficient by - Use polars in `map_batches` UDF (this is already covered at: https://docs.ray.io/en/master/data/transforming-data.html#choosing-the-right-batch-format) - Set `use_polars` or `use_polars_sort` in `DataContext` to enable built-in polars ops (`use_polars` flag is deprecated, so I only add `use_polars_sort` at this time) ## Related issues Closes: ray-project#59224 ## Additional information No --------- Signed-off-by: peterxcli <peterxcli@gmail.com> Signed-off-by: Peter Lee <peterxcli@gmail.com> Signed-off-by: You-Cheng Lin <mses010108@gmail.com> Signed-off-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Co-authored-by: You-Cheng Lin <mses010108@gmail.com> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Adel Nour <ans9868@nyu.edu>
## Description We can use polars to make operations more efficient by - Use polars in `map_batches` UDF (this is already covered at: https://docs.ray.io/en/master/data/transforming-data.html#choosing-the-right-batch-format) - Set `use_polars` or `use_polars_sort` in `DataContext` to enable built-in polars ops (`use_polars` flag is deprecated, so I only add `use_polars_sort` at this time) ## Related issues Closes: ray-project#59224 ## Additional information No --------- Signed-off-by: peterxcli <peterxcli@gmail.com> Signed-off-by: Peter Lee <peterxcli@gmail.com> Signed-off-by: You-Cheng Lin <mses010108@gmail.com> Signed-off-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Co-authored-by: You-Cheng Lin <mses010108@gmail.com> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Description
We can use polars to make operations more efficient by
map_batchesUDF (this is already covered at: https://docs.ray.io/en/master/data/transforming-data.html#choosing-the-right-batch-format)use_polarsoruse_polars_sortinDataContextto enable built-in polars ops (use_polarsflag is deprecated, so I only adduse_polars_sortat this time)Related issues
Closes: #59224
Additional information
No