Skip to content

[Data] Fix: Replace bare raise with TypeError in string concatenation#60795

Merged
bveeramani merged 5 commits into
ray-project:masterfrom
slfan1989:fix/pa-string-input-typeerror
Feb 11, 2026
Merged

[Data] Fix: Replace bare raise with TypeError in string concatenation#60795
bveeramani merged 5 commits into
ray-project:masterfrom
slfan1989:fix/pa-string-input-typeerror

Conversation

@slfan1989

Copy link
Copy Markdown
Contributor

Description

This PR fixes a bug in _to_pa_string_input() where attempting to concatenate string columns with non-string columns (e.g., numeric types) would raise a bare RuntimeError instead of a descriptive TypeError.

Changes:

  • Replaced bare raise statement with proper TypeError that includes a clear error message indicating expected vs actual input types
  • Simplified control flow using early returns
  • Added unit test test_string_concat_invalid_input_type to verify the fix

Before: Bare raise caused cryptic RuntimeError: No active exception to reraise

After: Clear TypeError: Expected string or string-like pyarrow Array/ChunkedArray for string concatenation, got int64.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Example of the fixed behavior:

import pyarrow as pa
from ray.data.expressions import col

table = pa.table({"name": ["Alice", "Bob"], "age": [25, 30]})
expr = col("name") + col("age")  # Attempting to concat string with int

# Now raises: TypeError: Expected string or string-like pyarrow Array/ChunkedArray 
# for string concatenation, got int64.
Replace the bare `raise` statement in `_to_pa_string_input()` with a
proper TypeError that includes a descriptive error message. This ensures
string concatenation operations fail with clear feedback when given
non-string inputs (e.g., numeric columns).

Changes:
- expression_evaluator.py: Add TypeError with descriptive message
- test_arithmetic.py: Add test for invalid input type rejection

Signed-off-by: slfan1989 <slfan1989@apache.org>
@slfan1989 slfan1989 requested a review from a team as a code owner February 6, 2026 01:40

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a solid improvement, replacing a bare raise with a descriptive TypeError for invalid string concatenation operations. The code is also simplified by using early returns. I've added one suggestion to make the error message even more informative by including the specific data type of the invalid input, which aligns with the goal stated in the pull request description. The new unit test is a great addition to prevent regressions.

Comment on lines +102 to +105
raise TypeError(
"Expected string or string-like pyarrow Array/ChunkedArray for string "
f"concatenation, got {type(x).__name__}."
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error message can be made more specific to better align with the goal stated in the PR description. When x is a pyarrow.Array or pyarrow.ChunkedArray, using x.type instead of type(x).__name__ will provide the underlying data type (e.g., int64), which is more informative for debugging.

Suggested change
raise TypeError(
"Expected string or string-like pyarrow Array/ChunkedArray for string "
f"concatenation, got {type(x).__name__}."
)
type_name = x.type if isinstance(x, (pa.Array, pa.ChunkedArray)) else type(x).__name__
raise TypeError(
"Expected string or string-like pyarrow Array/ChunkedArray for string "
f"concatenation, got {type_name}."
)

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

@ray-gardener ray-gardener Bot added the community-contribution Contributed by the community label Feb 6, 2026
Replace the bare `raise` statement in `_to_pa_string_input()` with a
proper TypeError that includes a descriptive error message. This ensures
string concatenation operations fail with clear feedback when given
non-string inputs (e.g., numeric columns).

Changes:
- expression_evaluator.py: Add TypeError with descriptive message
- test_arithmetic.py: Add test for invalid input type rejection

Signed-off-by: slfan1989 <slfan1989@apache.org>
Comment on lines +100 to +109
if isinstance(x, (pa.Array, pa.ChunkedArray)) and _is_pa_string_like(x):
return _pa_decode_dict_string_array(x)
if isinstance(x, (pa.Array, pa.ChunkedArray)):
actual_type = str(x.type)
else:
raise
return x
actual_type = type(x).__name__
raise TypeError(
"Expected string or string-like pyarrow Array/ChunkedArray for string "
f"concatenation, got {actual_type}."
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested cleanup:

    if isinstance(x, (pa.Array, pa.ChunkedArray)) and _is_pa_string_like(x):
        return _pa_decode_dict_string_array(x)
    actual_type = str(x.type) if isinstance(x, (pa.Array, pa.ChunkedArray)) else type(x).__name__
    raise TypeError(
        "Expected string or string-like pyarrow Array/ChunkedArray for string "
        f"concatenation, got {actual_type}."
    )

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions! I’ve updated the PR accordingly—could you please take another look?

slfan1989 and others added 2 commits February 7, 2026 07:10
Replace the bare `raise` statement in `_to_pa_string_input()` with a
proper TypeError that includes a descriptive error message. This ensures
string concatenation operations fail with clear feedback when given
non-string inputs (e.g., numeric columns).

Changes:
- expression_evaluator.py: Add TypeError with descriptive message
- test_arithmetic.py: Add test for invalid input type rejection

Signed-off-by: slfan1989 <slfan1989@apache.org>
@slfan1989

Copy link
Copy Markdown
Contributor Author

@goutamvenkat-anyscale @bveeramani Could you please review this PR again? Thank you very much!

@bveeramani

Copy link
Copy Markdown
Member

@goutamvenkat-anyscale @bveeramani Could you please review this PR again? Thank you very much!

Will defer to @goutamvenkat-anyscale since he has more context on this PR

@slfan1989

Copy link
Copy Markdown
Contributor Author

@goutamvenkat-anyscale @bveeramani Could you please review this PR again? Thank you very much!

Will defer to @goutamvenkat-anyscale since he has more context on this PR

Thanks a lot for your reply!

@goutamvenkat-anyscale goutamvenkat-anyscale left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@goutamvenkat-anyscale goutamvenkat-anyscale added go add ONLY when ready to merge, run all tests data Ray Data-related issues labels Feb 11, 2026
@bveeramani bveeramani enabled auto-merge (squash) February 11, 2026 20:02
@bveeramani bveeramani merged commit 12e3e50 into ray-project:master Feb 11, 2026
8 checks passed
@slfan1989

Copy link
Copy Markdown
Contributor Author

@goutamvenkat-anyscale @bveeramani Thank you very much for reviewing the code!

ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
…ray-project#60795)

## Description

This PR fixes a bug in `_to_pa_string_input()` where attempting to
concatenate string columns with non-string columns (e.g., numeric types)
would raise a bare `RuntimeError` instead of a descriptive `TypeError`.

**Changes:**
- Replaced bare `raise` statement with proper `TypeError` that includes
a clear error message indicating expected vs actual input types
- Simplified control flow using early returns
- Added unit test `test_string_concat_invalid_input_type` to verify the
fix

**Before:** Bare `raise` caused cryptic `RuntimeError: No active
exception to reraise`

**After:** Clear `TypeError: Expected string or string-like pyarrow
Array/ChunkedArray for string concatenation, got int64.`

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information

**Example of the fixed behavior:**
```python
import pyarrow as pa
from ray.data.expressions import col

table = pa.table({"name": ["Alice", "Bob"], "age": [25, 30]})
expr = col("name") + col("age")  # Attempting to concat string with int

# Now raises: TypeError: Expected string or string-like pyarrow Array/ChunkedArray
# for string concatenation, got int64.
```

---------

Signed-off-by: slfan1989 <slfan1989@apache.org>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
…ray-project#60795)

## Description

This PR fixes a bug in `_to_pa_string_input()` where attempting to
concatenate string columns with non-string columns (e.g., numeric types)
would raise a bare `RuntimeError` instead of a descriptive `TypeError`.

**Changes:**
- Replaced bare `raise` statement with proper `TypeError` that includes
a clear error message indicating expected vs actual input types
- Simplified control flow using early returns
- Added unit test `test_string_concat_invalid_input_type` to verify the
fix

**Before:** Bare `raise` caused cryptic `RuntimeError: No active
exception to reraise`

**After:** Clear `TypeError: Expected string or string-like pyarrow
Array/ChunkedArray for string concatenation, got int64.`


## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information

**Example of the fixed behavior:**
```python
import pyarrow as pa
from ray.data.expressions import col

table = pa.table({"name": ["Alice", "Bob"], "age": [25, 30]})
expr = col("name") + col("age")  # Attempting to concat string with int

# Now raises: TypeError: Expected string or string-like pyarrow Array/ChunkedArray 
# for string concatenation, got int64.
```

---------

Signed-off-by: slfan1989 <slfan1989@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

3 participants