[Data] Avoid importing cudf in _is_cudf_dataframe when cudf is not loaded by rayhhome · Pull Request #62302 · ray-project/ray

rayhhome · 2026-04-02T16:28:53Z

Description

_is_cudf_dataframe() is called on every batch in the map_batches hot path (validation + type dispatch). Previously it did try: import cudf unconditionally, which on environments with cudf installed (e.g. the ray-ml BYOD image) loads the full CUDA runtime — adding ~1.5 GiB RSS per worker even when no GPU is used.

This adds a sys.modules guard so cudf is only imported when it has already been loaded by someone else in the process. If cudf isn't in sys.modules, no object can be a cudf.DataFrame, so we return False immediately.

This eliminates OOM kills on CPU-only benchmarks running on the ray-ml image, where 8 workers × 1.5 GiB of unnecessary cudf overhead was pushing 30 GiB nodes past the 95% memory threshold.

Related issues

Related to the map_batches_fixed_size_tasks_numpy_once nightly benchmark OOM failures.

Additional information

The benchmark inherits type: gpu (ray-ml image) from the data test DEFAULTS in release_data_tests.yaml, which includes cudf-cu12 via dl-gpu-requirements.txt. The actual cluster uses CPU instances (m5.2xlarge). Every worker was importing cudf through _validate_batch_output -> _is_cudf_dataframe (line 519 in plan_udf_map_op.py), which runs on every UDF output batch regardless of batch format.

Signed-off-by: Sirui Huang <ray.huang@anyscale.com>

gemini-code-assist

Code Review

This pull request optimizes the _is_cudf_dataframe function in python/ray/data/block.py by checking sys.modules before performing a lazy import of cudf. This change prevents the unnecessary loading of CUDA and the associated memory overhead when cudf has not been previously imported. I have no feedback to provide.

Copilot

Pull request overview

This PR prevents unintended CUDA/cuDF initialization in Ray Data’s map_batches hot path by avoiding an import cudf unless cuDF has already been imported in the current process.

Changes:

Add a sys.modules guard in _is_cudf_dataframe() to return early when cudf hasn’t been imported yet.
Update _is_cudf_dataframe() docstring to document the rationale and memory impact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T23:41:45Z

+    if "cudf" not in sys.modules:
+        return False
    try:
        import cudf



Consider adding a unit test to prevent regressions of this optimization: when "cudf" is absent from sys.modules, _is_cudf_dataframe() should return False without attempting to import cudf (e.g., by patching sys.modules and asserting the import hook isn’t invoked for cudf). This is a hot-path check and the memory-impact regression described in the PR would be hard to notice without an explicit test.

…aded (ray-project#62302) ## Description `_is_cudf_dataframe()` is called on every batch in the map_batches hot path (validation + type dispatch). Previously it did try: import cudf unconditionally, which on environments with cudf installed (e.g. the ray-ml BYOD image) loads the full CUDA runtime — adding ~1.5 GiB RSS per worker even when no GPU is used. This adds a `sys.modules` guard so cudf is only imported when it has already been loaded by someone else in the process. If cudf isn't in `sys.modules`, no object can be a `cudf.DataFrame`, so we return False immediately. This eliminates OOM kills on CPU-only benchmarks running on the ray-ml image, where 8 workers × 1.5 GiB of unnecessary cudf overhead was pushing 30 GiB nodes past the 95% memory threshold. ## Related issues Related to the `map_batches_fixed_size_tasks_numpy_once` nightly benchmark OOM failures. ## Additional information The benchmark inherits type: gpu (ray-ml image) from the data test DEFAULTS in `release_data_tests.yaml`, which includes `cudf-cu12` via `dl-gpu-requirements.txt`. The actual cluster uses CPU instances (m5.2xlarge). Every worker was importing cudf through `_validate_batch_output` -> `_is_cudf_dataframe` (line 519 in plan_udf_map_op.py), which runs on every UDF output batch regardless of batch format. --------- Signed-off-by: Sirui Huang <ray.huang@anyscale.com>

## Description The map_batches release benchmark had `RAYTEST_FAIL_ON_WORKER_OOM=0`. After we landed some changes to minimize memory bloat like #62302, the test no longer OOMs, so I'm re-enabling the flag. ## Related issues None Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

## Description The map_batches release benchmark had `RAYTEST_FAIL_ON_WORKER_OOM=0`. After we landed some changes to minimize memory bloat like ray-project#62302, the test no longer OOMs, so I'm re-enabling the flag. ## Related issues None Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: phattruong <23120318@student.hcmus.edu.vn>

rayhhome added 2 commits April 2, 2026 09:15

Initial fix

4f5fc75

Signed-off-by: Sirui Huang <ray.huang@anyscale.com>

Switch to sys.modules

8ce67e6

Signed-off-by: Sirui Huang <ray.huang@anyscale.com>

rayhhome self-assigned this Apr 2, 2026

rayhhome added data Ray Data-related issues go add ONLY when ready to merge, run all tests labels Apr 2, 2026

Merge branch 'master' into cudf-optional-import

a905032

gemini-code-assist Bot reviewed Apr 2, 2026

View reviewed changes

Merge branch 'master' into cudf-optional-import

b146fbb

rayhhome marked this pull request as ready for review April 2, 2026 23:39

rayhhome requested a review from a team as a code owner April 2, 2026 23:39

Copilot AI review requested due to automatic review settings April 2, 2026 23:39

Copilot started reviewing on behalf of rayhhome April 2, 2026 23:39 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Merge branch 'master' into cudf-optional-import

b177f1b

goutamvenkat-anyscale approved these changes Apr 3, 2026

View reviewed changes

Merge branch 'master' into cudf-optional-import

3a3a234

bveeramani approved these changes Apr 3, 2026

View reviewed changes

bveeramani enabled auto-merge (squash) April 3, 2026 23:12

Merge branch 'master' into cudf-optional-import

8cfa3ec

github-actions Bot disabled auto-merge April 3, 2026 23:55

Merge branch 'master' into cudf-optional-import

bbd9a6c

bveeramani enabled auto-merge (squash) April 6, 2026 16:38

bveeramani merged commit 5300731 into ray-project:master Apr 6, 2026
7 checks passed

rayhhome deleted the cudf-optional-import branch April 7, 2026 18:42

bveeramani mentioned this pull request May 18, 2026

[Data] Fail map_batches benchmark on worker OOM #63474

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Data] Avoid importing cudf in _is_cudf_dataframe when cudf is not loaded#62302

[Data] Avoid importing cudf in _is_cudf_dataframe when cudf is not loaded#62302
bveeramani merged 8 commits into
ray-project:masterfrom
rayhhome:cudf-optional-import

rayhhome commented Apr 2, 2026

gemini-code-assist Bot left a comment

Copilot AI left a comment

Copilot AI Apr 2, 2026

Uh oh!

Labels

4 participants

Uh oh!

Conversation

rayhhome commented Apr 2, 2026

Description

Related issues

Additional information

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Labels

4 participants