Skip to content

[Data] Fix Timer JSON serialization broken by DistributionTracker#63825

Merged
elliot-barn merged 2 commits into
ray-project:masterfrom
xinyuangui2:xgui/fix-timer-json-serialization
Jun 3, 2026
Merged

[Data] Fix Timer JSON serialization broken by DistributionTracker#63825
elliot-barn merged 2 commits into
ray-project:masterfrom
xinyuangui2:xgui/fix-timer-json-serialization

Conversation

@xinyuangui2

Copy link
Copy Markdown
Contributor

Why

#63586 added a DistributionTracker to Timer for percentile tracking. DistributionTracker is not JSON-serializable, which broke the training-ingest-benchmark release test (image_classification.full_training, s3_url + parquet):

TypeError: Object of type DistributionTracker is not JSON serializable

The benchmark checkpoints each Timer via v.__dict__.copy() + json.dump, and __dict__ now carries the tracker. The tracker's __getstate__/__setstate__ only cover pickle, not json, so they don't help here.

Fix

  • Add Timer.as_dict() / Timer.from_dict() that round-trip only the scalar fields (_total, _min, _max, _total_count). _distribution is excluded — the KLL sketch isn't reconstructable from summary stats and isn't meant to persist across checkpoints, so it restarts empty on restore (same graceful degradation as when datasketches isn't installed).
  • Switch the benchmark runner to use these instead of poking __dict__.

🤖 Generated with Claude Code

…ngest benchmark

PR ray-project#63586 added a DistributionTracker to Timer for percentile tracking.
DistributionTracker is not JSON-serializable, which broke the
training-ingest benchmark: its checkpointing code dumps each Timer via
`v.__dict__.copy()` + `json.dump`, raising `TypeError: Object of type
DistributionTracker is not JSON serializable`. (The tracker's
__getstate__/__setstate__ only cover pickle, not json.)

Add `Timer.as_dict()` / `from_dict()` that round-trip only the scalar
fields (the sketch is not persisted; it restarts empty on restore, same
graceful degradation as when datasketches is unavailable), and switch
the benchmark runner to use them instead of poking `__dict__`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: xgui <xgui@anyscale.com>
@xinyuangui2 xinyuangui2 requested a review from a team as a code owner June 3, 2026 17:05

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces as_dict and from_dict methods to safely serialize and deserialize scalar stats, resolving JSON serialization issues caused by the non-serializable DistributionTracker in the benchmark runner. Direct __dict__ access has been replaced with these new methods, and comprehensive unit tests have been added. The review feedback recommends improving the robustness of from_dict by adding defensive checks to handle cases where the input state is None, not a dictionary, or contains None values for _total and _total_count.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread python/ray/data/_internal/stats.py Outdated
Ignore a non-dict ``state`` and treat explicit ``None`` values as the
empty-Timer defaults (``dict.get(k, 0)`` only defaults a missing key,
not a present ``None``), so a partial/corrupt metrics.json can't crash
or leave a Timer in an invalid state on restore.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: xgui <xgui@anyscale.com>
@xinyuangui2 xinyuangui2 added the go add ONLY when ready to merge, run all tests label Jun 3, 2026
@ray-gardener ray-gardener Bot added data Ray Data-related issues release-test release test labels Jun 3, 2026
@elliot-barn elliot-barn merged commit 691df86 into ray-project:master Jun 3, 2026
10 checks passed
rueian pushed a commit to rueian/ray that referenced this pull request Jun 4, 2026
…y-project#63825)

## Why

ray-project#63586 added a `DistributionTracker` to `Timer` for percentile tracking.
`DistributionTracker` is not JSON-serializable, which broke the
`training-ingest-benchmark` release test
(`image_classification.full_training`, s3_url + parquet):

```
TypeError: Object of type DistributionTracker is not JSON serializable
```

The benchmark checkpoints each `Timer` via `v.__dict__.copy()` +
`json.dump`, and `__dict__` now carries the tracker. The tracker's
`__getstate__`/`__setstate__` only cover **pickle**, not `json`, so they
don't help here.

## Fix

- Add `Timer.as_dict()` / `Timer.from_dict()` that round-trip only the
scalar fields (`_total`, `_min`, `_max`, `_total_count`).
`_distribution` is excluded — the KLL sketch isn't reconstructable from
summary stats and isn't meant to persist across checkpoints, so it
restarts empty on restore (same graceful degradation as when
`datasketches` isn't installed).
- Switch the benchmark runner to use these instead of poking `__dict__`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: xgui <xgui@anyscale.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
limarkdcunha pushed a commit to limarkdcunha/ray that referenced this pull request Jun 30, 2026
…y-project#63825)

## Why

ray-project#63586 added a `DistributionTracker` to `Timer` for percentile tracking.
`DistributionTracker` is not JSON-serializable, which broke the
`training-ingest-benchmark` release test
(`image_classification.full_training`, s3_url + parquet):

```
TypeError: Object of type DistributionTracker is not JSON serializable
```

The benchmark checkpoints each `Timer` via `v.__dict__.copy()` +
`json.dump`, and `__dict__` now carries the tracker. The tracker's
`__getstate__`/`__setstate__` only cover **pickle**, not `json`, so they
don't help here.

## Fix

- Add `Timer.as_dict()` / `Timer.from_dict()` that round-trip only the
scalar fields (`_total`, `_min`, `_max`, `_total_count`).
`_distribution` is excluded — the KLL sketch isn't reconstructable from
summary stats and isn't meant to persist across checkpoints, so it
restarts empty on restore (same graceful degradation as when
`datasketches` isn't installed).
- Switch the benchmark runner to use these instead of poking `__dict__`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: xgui <xgui@anyscale.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests release-test release test

3 participants