Skip to content

[Data] Add polars usage instruction to docs#60029

Merged
bveeramani merged 13 commits into
ray-project:masterfrom
peterxcli:docs/add-polars-detail-in-transforming-data
Feb 10, 2026
Merged

[Data] Add polars usage instruction to docs#60029
bveeramani merged 13 commits into
ray-project:masterfrom
peterxcli:docs/add-polars-detail-in-transforming-data

Conversation

@peterxcli

Copy link
Copy Markdown
Contributor

Description

We can use polars to make operations more efficient by

Related issues

Closes: #59224

Additional information

No

@peterxcli peterxcli requested a review from a team as a code owner January 10, 2026 16:52
… section

Signed-off-by: peterxcli <peterxcli@gmail.com>
@peterxcli peterxcli force-pushed the docs/add-polars-detail-in-transforming-data branch from f3b3875 to 5cbd0ef Compare January 10, 2026 16:52
@peterxcli

Copy link
Copy Markdown
Contributor Author
@owenowenisme owenowenisme added data Ray Data-related issues community-contribution Contributed by the community labels Jan 10, 2026
@owenowenisme

Copy link
Copy Markdown
Member

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds documentation on how to enable Polars operations in Ray Data via DataContext. The added section is clear and helpful. I've found a minor issue in one sentence which contains a typo and a grammatical error. I've provided a suggestion to fix it for better clarity.

Comment thread doc/source/data/transforming-data.rst Outdated

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds documentation on how to enable Polars-based operations in Ray Data, specifically for sorting, by setting use_polars_sort in the DataContext. The change is clear and useful. I've found a minor grammatical issue in the new documentation and suggested a correction to improve clarity.

Comment thread doc/source/data/transforming-data.rst Outdated
@owenowenisme

Copy link
Copy Markdown
Member

@peterxcli please fix the suggestion from gemini, thanks

@owenowenisme

owenowenisme commented Jan 10, 2026

Copy link
Copy Markdown
Member

And also, if we want to use glossary like "Polars", please use it with backtick like Polars or vale will raise error in CI.

@github-actions

Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions Bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 25, 2026
peterxcli and others added 3 commits January 26, 2026 00:38
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Peter Lee <peterxcli@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
@peterxcli

Copy link
Copy Markdown
Contributor Author

@owenowenisme please take another look. thanks!

@ryankert01 ryankert01 removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 25, 2026
owenowenisme and others added 3 commits January 26, 2026 14:12
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
@owenowenisme owenowenisme added the go add ONLY when ready to merge, run all tests label Jan 26, 2026
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
@peterxcli

Copy link
Copy Markdown
Contributor Author

@owenowenisme Thanks for the review🙏

Comment thread doc/source/data/transforming-data.rst Outdated
Comment thread doc/source/data/transforming-data.rst Outdated

If you encounter an OOM errors, try decreasing your ``batch_size``.

Enabling ``Polars`` operations

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Here and elsewhere -- I think it makes more sense for Polars to not be code text, especially since it's not part of the glossary or anything.

Comment thread doc/source/data/transforming-data.rst Outdated
ctx = ray.data.DataContext.get_current()
ctx.use_polars_sort = True

When you enable this flag, Ray Data automatically uses ``Polars`` for tabular dataset sorting operations, which can significantly improve performance for certain workloads. This doesn't affect your UDF code, you can still use any batch format in :meth:`~ray.data.Dataset.map_batches`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What're the user-facing Ray Data APIs that benefit from the polars feature?

IIUC it doesn't improve performance for most UDFs except for map_groups, and that's because of an implementation detail where we perform a sort.

Would this information be more appropriate in a different user guide(s)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to performance-tips, WDYT?

@iamjustinhsu

Copy link
Copy Markdown
Contributor

Hi @peterxcli, are you still working on this?

Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 9, 2026 03:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds documentation to the Ray Data “Transforming data” guide describing how to enable polars-backed optimizations via DataContext, in response to #59224.

Changes:

  • Add a new “Enabling Polars operations” subsection explaining DataContext.use_polars_sort.
  • Minor whitespace cleanup in existing .. testcode:: blocks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread doc/source/data/transforming-data.rst Outdated
Comment thread doc/source/data/transforming-data.rst Outdated
Comment thread doc/source/data/transforming-data.rst Outdated
Comment thread doc/source/data/transforming-data.rst Outdated
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
@owenowenisme owenowenisme force-pushed the docs/add-polars-detail-in-transforming-data branch from 2a239fa to 86082dc Compare February 9, 2026 04:29
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
@bveeramani bveeramani merged commit fa31667 into ray-project:master Feb 10, 2026
5 of 6 checks passed
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
## Description

We can use polars to make operations more efficient by
- Use polars in `map_batches` UDF (this is already covered at:
https://docs.ray.io/en/master/data/transforming-data.html#choosing-the-right-batch-format)
- Set `use_polars` or `use_polars_sort` in `DataContext` to enable
built-in polars ops (`use_polars` flag is deprecated, so I only add
`use_polars_sort` at this time)

## Related issues

Closes: ray-project#59224

## Additional information

No

---------

Signed-off-by: peterxcli <peterxcli@gmail.com>
Signed-off-by: Peter Lee <peterxcli@gmail.com>
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
Signed-off-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com>
Co-authored-by: You-Cheng Lin <mses010108@gmail.com>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
## Description


We can use polars to make operations more efficient by
- Use polars in `map_batches` UDF (this is already covered at:
https://docs.ray.io/en/master/data/transforming-data.html#choosing-the-right-batch-format)
- Set `use_polars` or `use_polars_sort` in `DataContext` to enable
built-in polars ops (`use_polars` flag is deprecated, so I only add
`use_polars_sort` at this time)

## Related issues

Closes: ray-project#59224

## Additional information

No

---------

Signed-off-by: peterxcli <peterxcli@gmail.com>
Signed-off-by: Peter Lee <peterxcli@gmail.com>
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
Signed-off-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com>
Co-authored-by: You-Cheng Lin <mses010108@gmail.com>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

6 participants