[data] Include column name and target type in ArrowConversionError by goutamvenkat-anyscale · Pull Request #62407 · ray-project/ray

goutamvenkat-anyscale · 2026-04-07T21:09:38Z

Why are these changes needed?

ArrowConversionError previously only showed the data that failed to convert, making it hard to identify which column caused the issue. For example, the raw Arrow error looks like:

File "pyarrow/array.pxi", line 405, in pyarrow.lib.asarray
File "pyarrow/array.pxi", line 375, in pyarrow.lib.array
File "pyarrow/array.pxi", line 46, in pyarrow.lib._sequence_to_array
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'numpy.float32' object

This gives no indication of which column or what type conversion was attempted. With this change, the wrapped error now includes the column name and inferred target type:

Before:

Error converting data to Arrow: [b'hello', 2.0]

After:

Error converting column 'my_column' (target type: binary) to Arrow: [b'hello', 2.0]

Repro script

import ray
import numpy as np

ray.init()

ds = ray.data.range(2)

def mix_types(batch):
    return {"my_column": [b"hello", np.float32(2.0)]}

ds.map_batches(mix_types, batch_size=2).write_parquet("/tmp/out")

Related issue number

N/A

Checks

I've signed off every commit.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've added any new APIs to the API Reference.
I've made sure the tests are passing.
Testing Strategy
- Unit tests

…ssage Signed-off-by: Goutam <goutam@anyscale.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Reviewed by Cursor Bugbot for commit 93cb773. Configure here.}

gemini-code-assist

Code Review

This pull request enhances the ArrowConversionError class to include optional column name and target type information, providing more context in error messages. The review feedback identifies a critical issue where a NameError could occur in _convert_to_pyarrow_native_array if an exception is raised before pa_type is defined. Additionally, there is a suggestion to refactor the error message construction logic for improved readability.

gemini-code-assist · 2026-04-07T21:12:12Z

+        if column_name is not None:
+            type_info = f" (target type: {pa_type})" if pa_type is not None else ""
+            message = (
+                f"Error converting column '{column_name}'{type_info}"
+                f" to Arrow: {data_str}"
+            )
+        else:
+            message = f"Error converting data to Arrow: {data_str}"


The message construction logic can be made more concise and arguably more readable by handling the column_name is None case first, and combining the f-string for the other case.

if column_name is None: message = f"Error converting data to Arrow: {data_str}" else: type_info = f" (target type: {pa_type})" if pa_type is not None else "" message = f"Error converting column '{column_name}'{type_info} to Arrow: {data_str}"

Signed-off-by: Goutam <goutam@anyscale.com>

iamjustinhsu · 2026-04-07T21:14:44Z

            data_str = data_str[: self.MAX_DATA_STR_LEN] + "..."
-        message = f"Error converting data to Arrow: {data_str}"
+        if column_name is not None:
+            type_info = f" (target type: {pa_type})" if pa_type is not None else ""


when can column_name be present but pa_type None?

also, what are the implications if you did this instead?

type_info = f" (target type: {pa_type})"

regardless of pa_type check?

pa_type can be None if it fails before _infer_pyarrow_type.

Signed-off-by: Goutam <goutam@anyscale.com>

…ay-project#62407) ## Why are these changes needed? `ArrowConversionError` previously only showed the data that failed to convert, making it hard to identify which column caused the issue. For example, the raw Arrow error looks like: ``` File "pyarrow/array.pxi", line 405, in pyarrow.lib.asarray File "pyarrow/array.pxi", line 375, in pyarrow.lib.array File "pyarrow/array.pxi", line 46, in pyarrow.lib._sequence_to_array File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status pyarrow.lib.ArrowTypeError: Expected bytes, got a 'numpy.float32' object ``` This gives no indication of which column or what type conversion was attempted. With this change, the wrapped error now includes the column name and inferred target type: Before: ``` Error converting data to Arrow: [b'hello', 2.0] ``` After: ``` Error converting column 'my_column' (target type: binary) to Arrow: [b'hello', 2.0] ``` ### Repro script ```python import ray import numpy as np ray.init() ds = ray.data.range(2) def mix_types(batch): return {"my_column": [b"hello", np.float32(2.0)]} ds.map_batches(mix_types, batch_size=2).write_parquet("/tmp/out") ``` ## Related issue number N/A ## Checks - [x] I've signed off every commit. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. - [x] I've made sure the tests are passing. - Testing Strategy - [x] Unit tests --------- Signed-off-by: Goutam <goutam@anyscale.com>

[data] Include column name and target type in ArrowConversionError me…

93cb773

…ssage Signed-off-by: Goutam <goutam@anyscale.com>

goutamvenkat-anyscale requested a review from a team as a code owner April 7, 2026 21:09

goutamvenkat-anyscale added the data Ray Data-related issues label Apr 7, 2026

cursor Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread python/ray/data/_internal/tensor_extensions/arrow.py

goutamvenkat-anyscale added the go add ONLY when ready to merge, run all tests label Apr 7, 2026

gemini-code-assist Bot reviewed Apr 7, 2026

View reviewed changes

[data] Initialize pa_type before try block to avoid UnboundLocalError

21c7c72

Signed-off-by: Goutam <goutam@anyscale.com>

iamjustinhsu approved these changes Apr 7, 2026

View reviewed changes

Merge branch 'master' into goutam/improve-arrow-conversion-error-message

cea3688

goutamvenkat-anyscale enabled auto-merge (squash) April 7, 2026 21:34

[data] Update test to match new ArrowConversionError message format

d6f5b46

Signed-off-by: Goutam <goutam@anyscale.com>

github-actions Bot disabled auto-merge April 7, 2026 22:38

goutamvenkat-anyscale enabled auto-merge (squash) April 7, 2026 22:38

goutamvenkat-anyscale merged commit 6f6aa90 into ray-project:master Apr 7, 2026
7 checks passed

goutamvenkat-anyscale deleted the goutam/improve-arrow-conversion-error-message branch April 21, 2026 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[data] Include column name and target type in ArrowConversionError#62407

[data] Include column name and target type in ArrowConversionError#62407
goutamvenkat-anyscale merged 4 commits into
ray-project:masterfrom
goutamvenkat-anyscale:goutam/improve-arrow-conversion-error-message

goutamvenkat-anyscale commented Apr 7, 2026 •

edited

Loading

cursor Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 7, 2026

iamjustinhsu Apr 7, 2026

iamjustinhsu Apr 7, 2026

goutamvenkat-anyscale Apr 7, 2026

Uh oh!

Labels

2 participants

Uh oh!

Conversation

goutamvenkat-anyscale commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Repro script

Related issue number

Checks

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 7, 2026

Choose a reason for hiding this comment

iamjustinhsu Apr 7, 2026

Choose a reason for hiding this comment

iamjustinhsu Apr 7, 2026

Choose a reason for hiding this comment

goutamvenkat-anyscale Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Labels

2 participants

goutamvenkat-anyscale commented Apr 7, 2026 •

edited

Loading