[data] Include column name and target type in ArrowConversionError#62407
Conversation
…ssage Signed-off-by: Goutam <goutam@anyscale.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 93cb773. Configure here.
There was a problem hiding this comment.
Code Review
This pull request enhances the ArrowConversionError class to include optional column name and target type information, providing more context in error messages. The review feedback identifies a critical issue where a NameError could occur in _convert_to_pyarrow_native_array if an exception is raised before pa_type is defined. Additionally, there is a suggestion to refactor the error message construction logic for improved readability.
| if column_name is not None: | ||
| type_info = f" (target type: {pa_type})" if pa_type is not None else "" | ||
| message = ( | ||
| f"Error converting column '{column_name}'{type_info}" | ||
| f" to Arrow: {data_str}" | ||
| ) | ||
| else: | ||
| message = f"Error converting data to Arrow: {data_str}" |
There was a problem hiding this comment.
The message construction logic can be made more concise and arguably more readable by handling the column_name is None case first, and combining the f-string for the other case.
if column_name is None:
message = f"Error converting data to Arrow: {data_str}"
else:
type_info = f" (target type: {pa_type})" if pa_type is not None else ""
message = f"Error converting column '{column_name}'{type_info} to Arrow: {data_str}"Signed-off-by: Goutam <goutam@anyscale.com>
| data_str = data_str[: self.MAX_DATA_STR_LEN] + "..." | ||
| message = f"Error converting data to Arrow: {data_str}" | ||
| if column_name is not None: | ||
| type_info = f" (target type: {pa_type})" if pa_type is not None else "" |
There was a problem hiding this comment.
when can column_name be present but pa_type None?
There was a problem hiding this comment.
also, what are the implications if you did this instead?
type_info = f" (target type: {pa_type})"regardless of pa_type check?
There was a problem hiding this comment.
pa_type can be None if it fails before _infer_pyarrow_type.
Signed-off-by: Goutam <goutam@anyscale.com>
…ay-project#62407) ## Why are these changes needed? `ArrowConversionError` previously only showed the data that failed to convert, making it hard to identify which column caused the issue. For example, the raw Arrow error looks like: ``` File "pyarrow/array.pxi", line 405, in pyarrow.lib.asarray File "pyarrow/array.pxi", line 375, in pyarrow.lib.array File "pyarrow/array.pxi", line 46, in pyarrow.lib._sequence_to_array File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status pyarrow.lib.ArrowTypeError: Expected bytes, got a 'numpy.float32' object ``` This gives no indication of which column or what type conversion was attempted. With this change, the wrapped error now includes the column name and inferred target type: Before: ``` Error converting data to Arrow: [b'hello', 2.0] ``` After: ``` Error converting column 'my_column' (target type: binary) to Arrow: [b'hello', 2.0] ``` ### Repro script ```python import ray import numpy as np ray.init() ds = ray.data.range(2) def mix_types(batch): return {"my_column": [b"hello", np.float32(2.0)]} ds.map_batches(mix_types, batch_size=2).write_parquet("/tmp/out") ``` ## Related issue number N/A ## Checks - [x] I've signed off every commit. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. - [x] I've made sure the tests are passing. - Testing Strategy - [x] Unit tests --------- Signed-off-by: Goutam <goutam@anyscale.com>

Why are these changes needed?
ArrowConversionErrorpreviously only showed the data that failed to convert, making it hard to identify which column caused the issue. For example, the raw Arrow error looks like:This gives no indication of which column or what type conversion was attempted. With this change, the wrapped error now includes the column name and inferred target type:
Before:
After:
Repro script
Related issue number
N/A
Checks
scripts/format.shto lint the changes in this PR.