[Data] - [1/n] Add Temporal, list, tensor, struct datatype support to RD Datatype#58225
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds significant new functionality to DataType by introducing support for list, struct, map, tensor, and temporal types, along with a pattern-matching system using DataType.ANY. The implementation is well-structured and includes a comprehensive set of factory methods and type predicates. The accompanying tests cover most of the new functionality thoroughly.
My review focuses on a few areas for improvement:
- Doctest Corrections: The
__repr__output for pattern-matchingDataTypes in several new factory method docstrings is incorrect and will cause doctests to fail. - Implementation Consistency: The
is_numerical_typemethod has an inconsistency between its documentation and implementation regarding boolean types for Arrow and NumPy. - Test Coverage: The tests for
is_numerical_typeshould be expanded to cover the boolean cases for Arrow and NumPy types.
Overall, this is a great addition. Addressing these points will improve the correctness and robustness of the new features.
6841413 to
e101fec
Compare
e101fec to
a5e00a6
Compare
a5e00a6 to
4119f0b
Compare
iamjustinhsu
left a comment
There was a problem hiding this comment.
took a first pass, didn't look at tests, but good job 👍
| >>> DataType.fixed_size_list(DataType.float32(), 3) | ||
| DataType(arrow:fixed_size_list<item: float>[3]) | ||
| """ | ||
| value_arrow_type = value_type.to_arrow_dtype() |
There was a problem hiding this comment.
how come this doesn't include the checks for value_type is None or isinstance(value_type, _LogicalDataType)? Also, can you elaborate on the meaning of this check and why it's necessary? Since I see it a lot, I suggest pushing it into a method
There was a problem hiding this comment.
So fixed_size_list requires both element type and list_size. For instance, I wanted to prevented cases like this: DataType.fixed_size_list(DataType.ANY, 3)
4119f0b to
f8db613
Compare
67674c2 to
296efc2
Compare
… RD Datatype Signed-off-by: Goutam <goutam@anyscale.com>
296efc2 to
59e8d4f
Compare
| assert isinstance(result, DataType) | ||
| assert result.is_arrow_type() | ||
| assert result._internal_type == pa_type | ||
| assert result._physical_dtype == pa_type |
There was a problem hiding this comment.
Good coverage of unit test.
Can you also add some e2e tests that use these types from the user-facing API?
There was a problem hiding this comment.
Right now we don't actually rely on the datatype for anything yet. We use it UDFExpr but there's no validation or schema updates based off that. The e2e tests will make sense once we plumb in schema inference & also use it for the new dataset.summary API.
… RD Datatype (ray-project#58225) ## Description As title suggests ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <goutam@anyscale.com>
… RD Datatype (ray-project#58225) ## Description As title suggests ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <goutam@anyscale.com>
… RD Datatype (ray-project#58225) ## Description As title suggests ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Description
As title suggests
Related issues
Additional information