Skip to content

[air] Fix missing comma in DataBatchType Union type#63872

Merged
matthewdeng merged 3 commits into
ray-project:masterfrom
awen11123:fix/databatchtype-missing-comma
Jun 8, 2026
Merged

[air] Fix missing comma in DataBatchType Union type#63872
matthewdeng merged 3 commits into
ray-project:masterfrom
awen11123:fix/databatchtype-missing-comma

Conversation

@awen11123

Copy link
Copy Markdown
Contributor

Why this change is needed

A missing comma between "pyarrow.Table" and "pandas.DataFrame" in the DataBatchType Union causes Python to concatenate adjacent string literals into "pyarrow.Tablepandas.DataFrame", which is not a valid type.

This makes DataBatchType incomplete — it should include both pyarrow.Table and pandas.DataFrame as separate Union members, matching the equivalent DataBatch type in ray.data.block.

Before

DataBatchType = Union[
    "numpy.ndarray", "pyarrow.Table" "pandas.DataFrame", Dict[str, "numpy.ndarray"]
]

↑ "pyarrow.Table" "pandas.DataFrame" concatenated into "pyarrow.Tablepandas.DataFrame"

After

DataBatchType = Union[
    "numpy.ndarray", "pyarrow.Table", "pandas.DataFrame", Dict[str, "numpy.ndarray"]
]

Related Issues

DataBatchType is referenced extensively in ray.air.util.data_batch_conversion, ray.data.preprocessor, and ray.data.util.data_batch_conversion. The incorrect type string could appear in user-facing error messages and type checking.

Checks

  • I have signed the commits with Developer Certificate of Origin (DCO)

🤖 Generated with Claude Code

@awen11123 awen11123 requested a review from a team as a code owner June 5, 2026 05:09

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a missing comma in the DataBatchType Union definition within python/ray/air/data_batch_type.py, which previously caused 'pyarrow.Table' and 'pandas.DataFrame' to be implicitly concatenated. There are no review comments, and I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

The missing comma between "pyarrow.Table" and "pandas.DataFrame" causes
Python to concatenate adjacent string literals into
"pyarrow.Tablepandas.DataFrame", which is not a valid type. This makes
DataBatchType incomplete — it should include both pyarrow.Table and
pandas.DataFrame as separate Union members, matching the equivalent
DataBatch type in ray.data.block.

Signed-off-by: awen11123 <awen11123@users.noreply.github.com>
Signed-off-by: awen <444014092@qq.com>
@awen11123 awen11123 force-pushed the fix/databatchtype-missing-comma branch from c4c3b34 to cf3ee57 Compare June 5, 2026 05:14
@ray-gardener ray-gardener Bot added data Ray Data-related issues community-contribution Contributed by the community labels Jun 5, 2026

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, LGTM

@pseudo-rnd-thoughts pseudo-rnd-thoughts added the go add ONLY when ready to merge, run all tests label Jun 5, 2026
@matthewdeng matthewdeng enabled auto-merge (squash) June 8, 2026 16:58
@github-actions github-actions Bot disabled auto-merge June 8, 2026 16:59
@matthewdeng matthewdeng enabled auto-merge (squash) June 8, 2026 17:49
@matthewdeng matthewdeng merged commit a68cbad into ray-project:master Jun 8, 2026
8 checks passed
sampan-s-nayak pushed a commit to sampan-s-nayak/ray that referenced this pull request Jun 10, 2026
## Why this change is needed

A missing comma between `"pyarrow.Table"` and `"pandas.DataFrame"` in
the `DataBatchType` Union causes Python to concatenate adjacent string
literals into `"pyarrow.Tablepandas.DataFrame"`, which is not a valid
type.

This makes `DataBatchType` incomplete — it should include both
`pyarrow.Table` and `pandas.DataFrame` as separate Union members,
matching the equivalent `DataBatch` type in `ray.data.block`.

### Before
```python
DataBatchType = Union[
    "numpy.ndarray", "pyarrow.Table" "pandas.DataFrame", Dict[str, "numpy.ndarray"]
]
```
↑ "pyarrow.Table" "pandas.DataFrame" concatenated into
"pyarrow.Tablepandas.DataFrame"

### After
```python
DataBatchType = Union[
    "numpy.ndarray", "pyarrow.Table", "pandas.DataFrame", Dict[str, "numpy.ndarray"]
]
```

## Related Issues

`DataBatchType` is referenced extensively in
`ray.air.util.data_batch_conversion`, `ray.data.preprocessor`, and
`ray.data.util.data_batch_conversion`. The incorrect type string could
appear in user-facing error messages and type checking.

## Checks

- [x] I have signed the commits with Developer Certificate of Origin
(DCO)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: awen11123 <awen11123@users.noreply.github.com>
Signed-off-by: awen <444014092@qq.com>
Co-authored-by: matthewdeng <matt@anyscale.com>
limarkdcunha pushed a commit to limarkdcunha/ray that referenced this pull request Jun 30, 2026
## Why this change is needed

A missing comma between `"pyarrow.Table"` and `"pandas.DataFrame"` in
the `DataBatchType` Union causes Python to concatenate adjacent string
literals into `"pyarrow.Tablepandas.DataFrame"`, which is not a valid
type.

This makes `DataBatchType` incomplete — it should include both
`pyarrow.Table` and `pandas.DataFrame` as separate Union members,
matching the equivalent `DataBatch` type in `ray.data.block`.

### Before
```python
DataBatchType = Union[
    "numpy.ndarray", "pyarrow.Table" "pandas.DataFrame", Dict[str, "numpy.ndarray"]
]
```
↑ "pyarrow.Table" "pandas.DataFrame" concatenated into
"pyarrow.Tablepandas.DataFrame"

### After
```python
DataBatchType = Union[
    "numpy.ndarray", "pyarrow.Table", "pandas.DataFrame", Dict[str, "numpy.ndarray"]
]
```

## Related Issues

`DataBatchType` is referenced extensively in
`ray.air.util.data_batch_conversion`, `ray.data.preprocessor`, and
`ray.data.util.data_batch_conversion`. The incorrect type string could
appear in user-facing error messages and type checking.

## Checks

- [x] I have signed the commits with Developer Certificate of Origin
(DCO)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: awen11123 <awen11123@users.noreply.github.com>
Signed-off-by: awen <444014092@qq.com>
Co-authored-by: matthewdeng <matt@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

3 participants