[Data] Add map namespace support for expression operations by ryankert01 · Pull Request #59879 · ray-project/ray

ryankert01 · 2026-01-06T06:41:41Z

Description

`MapNamespace` impl.

Implemented _extract_map_component as a robust, vectorized fallback since native pc.map_keys kernels are not standard in PyArrow yet.
Support: Handles both Logical Maps (MapArray) and Physical Maps (List<Struct>).

Testing

test_map_keys / test_map_values: Standard extraction.
test_physical_map_extraction: Verifies support for List<Struct>.
test_map_sliced_offsets: Verifies the critical fix for sliced data.
test_map_nulls_and_empty: Verifies handling of None and empty maps {}.
test_map_chaining: Verifies composition with List namespace (e.g., .map.keys().list.len()).

Related issues

Related to #58674
Continues #58743

Additional information

test w/

python -m pytest -v -s python/ray/data/tests/test_namespace_expressions.py::TestMapNamespace

^{Cursor Bugbot found 1 potential issue for commit 7a11478}

Signed-off-by: Hsien-Cheng Huang <ryankert01@gmail.com>

gemini-code-assist

Code Review

This pull request introduces support for map/dict operations on expression columns by adding a map namespace. The implementation is well-structured, adding a _MapNamespace with keys() and values() methods that work on both logical MapArray and physical List<Struct> representations. The handling of sliced arrays with non-zero offsets is a great detail that ensures correctness. The accompanying tests are thorough, covering various representations, edge cases like nulls and empty maps, and integration with other namespaces.

I've added a couple of suggestions to map_namespace.py to further improve the robustness of the implementation by handling LargeListArray and providing clearer errors for unsupported types. Overall, this is a solid contribution that enhances Ray Data's expression capabilities.

Signed-off-by: Hsien-Cheng Huang <ryankert01@gmail.com>

ryankert01 · 2026-01-12T16:55:36Z

PTAL @goutamvenkat-anyscale @owenowenisme

owenowenisme

Minor fixes, overall LGTM

Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Signed-off-by: Ryan Huang <ryankert01@gmail.com>

Signed-off-by: Hsien-Cheng Huang <hcr@apache.org>

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

goutamvenkat-anyscale · 2026-02-03T01:47:28Z

+    assert list(rows[0]["keys"]) == ["a"] and list(rows[0]["values"]) == [1]
+    assert len(rows[1]["keys"]) == 0 and len(rows[1]["values"]) == 0
+    assert rows[2]["keys"] is None and rows[2]["values"] is None


Let's use rows_same

row_same operates on pandas that can't handle the mixed None/list column when converting. The to_pandas() path triggers TensorArray casting which fails on the mixed types. Let's keep it!

Although there's workaround, but is too complex for the context of this test:

ctx = ray.data.context.DataContext.get_current() ctx.enable_tensor_extension_casting = False try: result = ( ds.with_column("keys", col("m").map.keys()) .with_column("values", col("m").map.values()) .to_pandas() ) expected = pd.DataFrame( { "keys": [["a"], [], None], "values": [[1], [], None], } ) _assert_result(result, expected, drop_cols=["m"]) finally: ctx.enable_tensor_extension_casting = True

goutamvenkat-anyscale · 2026-02-03T02:00:02Z

+        if start_offset.as_py() != 0:
+            end_offset = offsets[-1].as_py()
+            child_array = child_array.slice(
+                offset=start_offset.as_py(), length=end_offset - start_offset.as_py()


Don't believe you need to call as_py here

goutamvenkat-anyscale

Please look at open comments. Thanks

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

goutamvenkat-anyscale

lgtm!

alexeykudinkin · 2026-02-13T00:02:29Z

+    )
+
+
+def _rebuild_list_array(


Help me understand why we need to do this?

It's because when we slice a MapArray or ListArray, the child arrays (keys/values) remain unchanged and offsets still reference positions in the original buffer. (zero-copy slicing by pyArrow)

-> we have to re-build it to 0-based.

goutamvenkat-anyscale · 2026-02-25T22:45:16Z

@ryankert01 There seems to be some open comments. Please address those

…pyarrow.Array] Signed-off-by: Hsien-Cheng Huang <hcr@apache.org>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

…rgeListArray Signed-off-by: Hsien-Cheng Huang <hcr@apache.org>

ryankert01 · 2026-03-04T02:44:01Z

PTAL @goutamvenkat-anyscale @alexeykudinkin

ryankert01 requested a review from a team as a code owner January 6, 2026 06:41

[Data] Add map namespace support for expression operations

1bd4269

Signed-off-by: Hsien-Cheng Huang <ryankert01@gmail.com>

ryankert01 force-pushed the map-expression branch from 694b035 to 1bd4269 Compare January 6, 2026 06:44

Merge branch 'master' into map-expression

2e157bd

cursor Bot reviewed Jan 6, 2026

View reviewed changes

Comment thread python/ray/data/namespace_expressions/map_namespace.py

gemini-code-assist Bot reviewed Jan 6, 2026

View reviewed changes

Comment thread python/ray/data/namespace_expressions/map_namespace.py Outdated

Comment thread python/ray/data/namespace_expressions/map_namespace.py Outdated

address ai review

68bef64

Signed-off-by: Hsien-Cheng Huang <ryankert01@gmail.com>

ray-gardener Bot added data Ray Data-related issues community-contribution Contributed by the community labels Jan 6, 2026

cursor Bot reviewed Jan 6, 2026

View reviewed changes

Comment thread python/ray/data/namespace_expressions/map_namespace.py Outdated

Comment thread python/ray/data/namespace_expressions/map_namespace.py

ryankert01 added 4 commits January 6, 2026 13:50

fix cursor bot suggestions

fe2642b

Signed-off-by: Hsien-Cheng Huang <ryankert01@gmail.com>

Merge branch 'master' into map-expression

843cac1

Merge remote-tracking branch 'origin/master' into map-expression

f16bfd1

refactor tests

df1fe8c

Signed-off-by: Hsien-Cheng Huang <ryankert01@gmail.com>

cursor Bot reviewed Jan 12, 2026

View reviewed changes

Comment thread python/ray/data/namespace_expressions/map_namespace.py

Merge branch 'master' into map-expression

6461062

ryankert01 assigned goutamvenkat-anyscale Jan 14, 2026

ryankert01 and others added 2 commits January 19, 2026 00:54

Merge branch 'master' into map-expression

fcd3652

Merge branch 'master' into map-expression

202a652

owenowenisme reviewed Jan 21, 2026

View reviewed changes

Comment thread python/ray/data/namespace_expressions/map_namespace.py

Comment thread python/ray/data/namespace_expressions/map_namespace.py Outdated

ryankert01 and others added 3 commits January 22, 2026 19:57

Update python/ray/data/namespace_expressions/map_namespace.py

50a2e64

Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Signed-off-by: Ryan Huang <ryankert01@gmail.com>

address commits

e613cfa

Signed-off-by: Hsien-Cheng Huang <hcr@apache.org>

Merge branch 'master' into map-expression

70a3760

cursor Bot reviewed Jan 22, 2026

View reviewed changes

Comment thread python/ray/data/namespace_expressions/map_namespace.py Outdated

goutamvenkat-anyscale reviewed Jan 23, 2026

View reviewed changes

Comment thread python/ray/data/namespace_expressions/map_namespace.py Outdated

goutamvenkat-anyscale reviewed Jan 23, 2026

View reviewed changes

Comment thread python/ray/data/namespace_expressions/map_namespace.py Outdated

goutamvenkat-anyscale reviewed Jan 23, 2026

View reviewed changes

Comment thread python/ray/data/namespace_expressions/map_namespace.py Outdated

ryankert01 and others added 2 commits January 25, 2026 13:12

Merge branch 'master' into map-expression

49268ec

create 3 helper functions to make the intent clearer

c390a24

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

ryankert01 requested a review from owenowenisme January 25, 2026 15:55

ryankert01 added 2 commits January 26, 2026 00:51

lint

978132e

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

Merge remote-tracking branch 'origin/map-expression' into map-expression

2eff519

goutamvenkat-anyscale reviewed Feb 3, 2026

View reviewed changes

Comment thread python/ray/data/tests/expressions/test_namespace_map.py

goutamvenkat-anyscale reviewed Feb 3, 2026

View reviewed changes

goutamvenkat-anyscale approved these changes Feb 3, 2026

View reviewed changes

iamjustinhsu added the go add ONLY when ready to merge, run all tests label Feb 4, 2026

Merge branch 'master' into map-expression

7a11478

cursor Bot reviewed Feb 4, 2026

View reviewed changes

Comment thread python/ray/data/namespace_expressions/map_namespace.py Outdated

goutamvenkat-anyscale requested changes Feb 6, 2026

View reviewed changes

ryankert01 added 2 commits February 8, 2026 13:22

Merge branch 'master' into map-expression

dae4645

address comments

59f8047

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

ryankert01 requested a review from goutamvenkat-anyscale February 8, 2026 06:53

goutamvenkat-anyscale approved these changes Feb 12, 2026

View reviewed changes

alexeykudinkin reviewed Feb 13, 2026

View reviewed changes

goutamvenkat-anyscale added the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Feb 25, 2026

ryankert01 and others added 2 commits March 3, 2026 07:44

Merge branch 'master' into map-expression

ef6f899

[fix] Update type hint for _get_child_array return value to Optional[…

5df9b8f

…pyarrow.Array] Signed-off-by: Hsien-Cheng Huang <hcr@apache.org>

ryankert01 force-pushed the map-expression branch from 2e376db to 5df9b8f Compare March 3, 2026 02:14

Merge branch 'master' into map-expression

183aec1

cursor Bot reviewed Mar 3, 2026

View reviewed changes

Comment thread python/ray/data/namespace_expressions/map_namespace.py

[fix] Refactor _rebuild_list_array and _get_result_type to support La…

2aebeb6

…rgeListArray Signed-off-by: Hsien-Cheng Huang <hcr@apache.org>

ryankert01 requested a review from alexeykudinkin March 3, 2026 05:24

richardliaw merged commit 6ddbbdd into ray-project:master Mar 4, 2026
6 checks passed

ryankert01 deleted the map-expression branch March 4, 2026 06:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Data] Add map namespace support for expression operations#59879

[Data] Add map namespace support for expression operations#59879
richardliaw merged 27 commits into
ray-project:masterfrom
ryankert01:map-expression

ryankert01 commented Jan 6, 2026 •

edited by cursor Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryankert01 commented Jan 12, 2026

Uh oh!

owenowenisme left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goutamvenkat-anyscale Feb 3, 2026

ryankert01 Feb 8, 2026 •

edited

Loading

Uh oh!

goutamvenkat-anyscale Feb 3, 2026

Uh oh!

goutamvenkat-anyscale left a comment

goutamvenkat-anyscale left a comment

Uh oh!

alexeykudinkin Feb 13, 2026

ryankert01 Mar 3, 2026 •

edited

Loading

goutamvenkat-anyscale commented Feb 25, 2026

cursor Bot left a comment

Uh oh!

ryankert01 commented Mar 4, 2026

Uh oh!

Labels

6 participants

		)


		def _rebuild_list_array(

Uh oh!

Conversation

ryankert01 commented Jan 6, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

MapNamespace impl.

Testing

Related issues

Additional information

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryankert01 commented Jan 12, 2026

Uh oh!

owenowenisme left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goutamvenkat-anyscale Feb 3, 2026

Choose a reason for hiding this comment

ryankert01 Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

goutamvenkat-anyscale Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

goutamvenkat-anyscale left a comment

Choose a reason for hiding this comment

goutamvenkat-anyscale left a comment

Choose a reason for hiding this comment

Uh oh!

alexeykudinkin Feb 13, 2026

Choose a reason for hiding this comment

ryankert01 Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

goutamvenkat-anyscale commented Feb 25, 2026

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ryankert01 commented Mar 4, 2026

Uh oh!

Labels

6 participants

ryankert01 commented Jan 6, 2026 •

edited by cursor Bot

Loading

`MapNamespace` impl.

owenowenisme left a comment •

edited

Loading

ryankert01 Feb 8, 2026 •

edited

Loading

ryankert01 Mar 3, 2026 •

edited

Loading