Skip to content

[Data] Remove obsolete PyArrow 9.0 version checks#61483

Merged
bveeramani merged 2 commits into
ray-project:masterfrom
slfan1989:cleanup/remove-pyarrow-9-version-checks
Mar 18, 2026
Merged

[Data] Remove obsolete PyArrow 9.0 version checks#61483
bveeramani merged 2 commits into
ray-project:masterfrom
slfan1989:cleanup/remove-pyarrow-9-version-checks

Conversation

@slfan1989

Copy link
Copy Markdown
Contributor

Description

Remove obsolete PyArrow version checks for object extension type support. Since Ray's minimum PyArrow version is now >= 9.0.0 (and object extension type was introduced in PyArrow 9.0.0), the version gating logic is redundant and always evaluates to true.

This PR simplifies the codebase by:

  • Removing MIN_PYARROW_VERSION_SCALAR_SUBCLASS constant and _object_extension_type_allowed() function
  • Simplifying object fallback logic to only check DataContext.enable_fallback_to_arrow_object_ext_type config
  • Removing version-based test skips
  • Fixing PYARROW_VERSION import issue in transform_pyarrow.py

Related issues

Link related issues: "Fixes #61482", "Closes #61482", or "Related to #61482".

Additional information

Key changes:

  1. Core logic simplification (tensor_extensions/arrow.py):

    • Removed nested version check conditions
    • Fallback behavior now controlled solely by DataContext config
    • Updated log messages to reflect simplified logic
  2. Import fix (arrow_ops/transform_pyarrow.py):

    • Removed PYARROW_VERSION from imports (was causing circular dependency)
    • Defined locally as PYARROW_VERSION = get_pyarrow_version()
  3. Test improvements:

    • Removed @pytest.mark.skipif(_object_extension_type_allowed(), ...) decorators
    • Fixed test assumption: test_dict_and_none_use_arrow_block now correctly expects Arrow blocks with struct types, not pandas fallback
    • Split test_fallback_to_pandas_on_incompatible_data into separate tests for map and map_batches to clarify different behaviors
Since Ray's minimum PyArrow version is >= 9.0.0, version checks for
object extension type support are no longer needed.

Changes:
- Remove MIN_PYARROW_VERSION_SCALAR_SUBCLASS and _object_extension_type_allowed()
- Simplify object fallback logic to only use DataContext.enable_fallback_to_arrow_object_ext_type
- Fix PYARROW_VERSION import in arrow_ops/transform_pyarrow.py by defining it locally
- Remove version-based test skips
- Improve test accuracy: dicts use Arrow struct type, not pandas fallback

This cleanup removes ~80 lines of dead code and improves maintainability.

Signed-off-by: slfan1989 <slfan1989@apache.org>
@slfan1989 slfan1989 requested a review from a team as a code owner March 4, 2026 09:26

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively removes obsolete PyArrow 9.0 version checks, which simplifies the codebase as the minimum supported version now includes the necessary features. The changes are well-executed, removing redundant logic and constants, and updating tests to reflect the new baseline behavior. I've noticed one minor opportunity for improvement regarding code duplication in the tests, which I've commented on. Overall, this is a great cleanup that improves code clarity and maintainability.

I am having trouble creating individual review comments. Click here to see my feedback.

python/ray/data/tests/test_transform_pyarrow.py (50-51)

medium

This UnsupportedType class is removed from the module scope but then re-defined in both test_map_batches_fallback_to_pandas_on_incompatible_data and test_map_raises_on_incompatible_data. To avoid this code duplication, it would be better to keep the class definition at the module level and reuse it in both tests. This will improve code maintainability.

@ray-gardener ray-gardener Bot added data Ray Data-related issues community-contribution Contributed by the community labels Mar 4, 2026
@bveeramani bveeramani self-assigned this Mar 4, 2026

@bveeramani bveeramani left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

# an error during block construction because we cannot cast dicts
# to a supported arrow type. This test checks that the block
# construction falls back to pandas and still succeeds.
def test_dict_and_none_use_arrow_block(ray_start_regular_shared, restore_data_context):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand this name. I think this test is more like "Test that batches of dicts are stored as a structs in PyArrow tables". What does none allude to?

Comment on lines +205 to +207
# Dicts are represented as Arrow struct types, so the block should remain Arrow
# even if object-extension fallback is disabled.
DataContext.get_current().enable_fallback_to_arrow_object_ext_type = False

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we remove this? Even if if we use the object extension type, won't the struct assertion fail?

@bveeramani bveeramani enabled auto-merge (squash) March 18, 2026 18:57
@github-actions github-actions Bot added the go add ONLY when ready to merge, run all tests label Mar 18, 2026
@bveeramani bveeramani removed their assignment Mar 18, 2026
@bveeramani bveeramani merged commit 8b561dd into ray-project:master Mar 18, 2026
8 checks passed
matthewdeng added a commit to matthewdeng/ray that referenced this pull request Mar 19, 2026
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Mar 25, 2026
## Description

Remove obsolete PyArrow version checks for object extension type
support. Since Ray's minimum PyArrow version is now >= 9.0.0 (and object
extension type was introduced in PyArrow 9.0.0), the version gating
logic is redundant and always evaluates to true.

This PR simplifies the codebase by:
- Removing `MIN_PYARROW_VERSION_SCALAR_SUBCLASS` constant and
`_object_extension_type_allowed()` function
- Simplifying object fallback logic to only check
`DataContext.enable_fallback_to_arrow_object_ext_type` config
- Removing version-based test skips
- Fixing `PYARROW_VERSION` import issue in `transform_pyarrow.py`

## Related issues
> Link related issues: "Fixes ray-project#61482", "Closes ray-project#61482", or "Related to
ray-project#61482".

## Additional information

**Key changes:**

1. **Core logic simplification** (`tensor_extensions/arrow.py`):
   - Removed nested version check conditions
   - Fallback behavior now controlled solely by DataContext config
   - Updated log messages to reflect simplified logic

2. **Import fix** (`arrow_ops/transform_pyarrow.py`):
- Removed `PYARROW_VERSION` from imports (was causing circular
dependency)
   - Defined locally as `PYARROW_VERSION = get_pyarrow_version()`

3. **Test improvements**:
- Removed `@pytest.mark.skipif(_object_extension_type_allowed(), ...)`
decorators
- Fixed test assumption: `test_dict_and_none_use_arrow_block` now
correctly expects Arrow blocks with struct types, not pandas fallback
- Split `test_fallback_to_pandas_on_incompatible_data` into separate
tests for `map` and `map_batches` to clarify different behaviors

Signed-off-by: slfan1989 <slfan1989@apache.org>
Lucas61000 pushed a commit to Lucas61000/ray that referenced this pull request May 15, 2026
## Description

Remove obsolete PyArrow version checks for object extension type
support. Since Ray's minimum PyArrow version is now >= 9.0.0 (and object
extension type was introduced in PyArrow 9.0.0), the version gating
logic is redundant and always evaluates to true.

This PR simplifies the codebase by:
- Removing `MIN_PYARROW_VERSION_SCALAR_SUBCLASS` constant and
`_object_extension_type_allowed()` function
- Simplifying object fallback logic to only check
`DataContext.enable_fallback_to_arrow_object_ext_type` config
- Removing version-based test skips
- Fixing `PYARROW_VERSION` import issue in `transform_pyarrow.py`

## Related issues
> Link related issues: "Fixes ray-project#61482", "Closes ray-project#61482", or "Related to
ray-project#61482".

## Additional information

**Key changes:**

1. **Core logic simplification** (`tensor_extensions/arrow.py`):
   - Removed nested version check conditions
   - Fallback behavior now controlled solely by DataContext config
   - Updated log messages to reflect simplified logic

2. **Import fix** (`arrow_ops/transform_pyarrow.py`):
- Removed `PYARROW_VERSION` from imports (was causing circular
dependency)
   - Defined locally as `PYARROW_VERSION = get_pyarrow_version()`

3. **Test improvements**:
- Removed `@pytest.mark.skipif(_object_extension_type_allowed(), ...)`
decorators
- Fixed test assumption: `test_dict_and_none_use_arrow_block` now
correctly expects Arrow blocks with struct types, not pandas fallback
- Split `test_fallback_to_pandas_on_incompatible_data` into separate
tests for `map` and `map_batches` to clarify different behaviors

Signed-off-by: slfan1989 <slfan1989@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

2 participants