[Data] Fix Timer JSON serialization broken by DistributionTracker#63825
Conversation
…ngest benchmark PR ray-project#63586 added a DistributionTracker to Timer for percentile tracking. DistributionTracker is not JSON-serializable, which broke the training-ingest benchmark: its checkpointing code dumps each Timer via `v.__dict__.copy()` + `json.dump`, raising `TypeError: Object of type DistributionTracker is not JSON serializable`. (The tracker's __getstate__/__setstate__ only cover pickle, not json.) Add `Timer.as_dict()` / `from_dict()` that round-trip only the scalar fields (the sketch is not persisted; it restarts empty on restore, same graceful degradation as when datasketches is unavailable), and switch the benchmark runner to use them instead of poking `__dict__`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: xgui <xgui@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request introduces as_dict and from_dict methods to safely serialize and deserialize scalar stats, resolving JSON serialization issues caused by the non-serializable DistributionTracker in the benchmark runner. Direct __dict__ access has been replaced with these new methods, and comprehensive unit tests have been added. The review feedback recommends improving the robustness of from_dict by adding defensive checks to handle cases where the input state is None, not a dictionary, or contains None values for _total and _total_count.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Ignore a non-dict ``state`` and treat explicit ``None`` values as the empty-Timer defaults (``dict.get(k, 0)`` only defaults a missing key, not a present ``None``), so a partial/corrupt metrics.json can't crash or leave a Timer in an invalid state on restore. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: xgui <xgui@anyscale.com>
…y-project#63825) ## Why ray-project#63586 added a `DistributionTracker` to `Timer` for percentile tracking. `DistributionTracker` is not JSON-serializable, which broke the `training-ingest-benchmark` release test (`image_classification.full_training`, s3_url + parquet): ``` TypeError: Object of type DistributionTracker is not JSON serializable ``` The benchmark checkpoints each `Timer` via `v.__dict__.copy()` + `json.dump`, and `__dict__` now carries the tracker. The tracker's `__getstate__`/`__setstate__` only cover **pickle**, not `json`, so they don't help here. ## Fix - Add `Timer.as_dict()` / `Timer.from_dict()` that round-trip only the scalar fields (`_total`, `_min`, `_max`, `_total_count`). `_distribution` is excluded — the KLL sketch isn't reconstructable from summary stats and isn't meant to persist across checkpoints, so it restarts empty on restore (same graceful degradation as when `datasketches` isn't installed). - Switch the benchmark runner to use these instead of poking `__dict__`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: xgui <xgui@anyscale.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y-project#63825) ## Why ray-project#63586 added a `DistributionTracker` to `Timer` for percentile tracking. `DistributionTracker` is not JSON-serializable, which broke the `training-ingest-benchmark` release test (`image_classification.full_training`, s3_url + parquet): ``` TypeError: Object of type DistributionTracker is not JSON serializable ``` The benchmark checkpoints each `Timer` via `v.__dict__.copy()` + `json.dump`, and `__dict__` now carries the tracker. The tracker's `__getstate__`/`__setstate__` only cover **pickle**, not `json`, so they don't help here. ## Fix - Add `Timer.as_dict()` / `Timer.from_dict()` that round-trip only the scalar fields (`_total`, `_min`, `_max`, `_total_count`). `_distribution` is excluded — the KLL sketch isn't reconstructable from summary stats and isn't meant to persist across checkpoints, so it restarts empty on restore (same graceful degradation as when `datasketches` isn't installed). - Switch the benchmark runner to use these instead of poking `__dict__`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: xgui <xgui@anyscale.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Why
#63586 added a
DistributionTrackertoTimerfor percentile tracking.DistributionTrackeris not JSON-serializable, which broke thetraining-ingest-benchmarkrelease test (image_classification.full_training, s3_url + parquet):The benchmark checkpoints each
Timerviav.__dict__.copy()+json.dump, and__dict__now carries the tracker. The tracker's__getstate__/__setstate__only cover pickle, notjson, so they don't help here.Fix
Timer.as_dict()/Timer.from_dict()that round-trip only the scalar fields (_total,_min,_max,_total_count)._distributionis excluded — the KLL sketch isn't reconstructable from summary stats and isn't meant to persist across checkpoints, so it restarts empty on restore (same graceful degradation as whendatasketchesisn't installed).__dict__.🤖 Generated with Claude Code