Skip to content

[rllib, train] Add support for nested metrics in Result.get_best_checkpoint#58537

Merged
justinvyu merged 5 commits into
ray-project:masterfrom
pseudo-rnd-thoughts:issue-57533
Nov 21, 2025
Merged

[rllib, train] Add support for nested metrics in Result.get_best_checkpoint#58537
justinvyu merged 5 commits into
ray-project:masterfrom
pseudo-rnd-thoughts:issue-57533

Conversation

@pseudo-rnd-thoughts

Copy link
Copy Markdown
Member

Description

RLlib uses nested metric structure (like "{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}") which Result.get_best_checkpoint doesn't support.
Following ResultGrid.get_best_result() to use unflattened_lookup, I've added that to get_best_checkpoint along with testing for nested structures (and its backward compatibility)

Reproduction script

from ray import tune
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.core.rl_module.default_model_config import DefaultModelConfig
from ray.rllib.utils.metrics import (
    ENV_RUNNER_RESULTS,
    EPISODE_RETURN_MEAN,
    NUM_ENV_STEPS_SAMPLED_LIFETIME,
)
from ray.tune.result import TRAINING_ITERATION

config = (
    PPOConfig()
    .environment("CartPole-v1")
    .training(num_epochs=6)
)

tuner = tune.Tuner(
    "PPO",
    param_space=config.to_dict(),
    run_config=tune.RunConfig(
        "PPO_Reproduce",
        checkpoint_config=tune.CheckpointConfig(
            num_to_keep=10,
            checkpoint_score_attribute=f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}",
            checkpoint_at_end=True,
            checkpoint_frequency=5,
        ),
        stop={
            f"{ENV_RUNNER_RESULTS}/{NUM_ENV_STEPS_SAMPLED_LIFETIME}": 3e5,
            f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}": 450,
            TRAINING_ITERATION: 100,
        },
    ),
)

results = tuner.fit()
best_result = results.get_best_result()
ckpt = best_result.get_best_checkpoint(f"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}", "max")
print(ckpt.path)

Related issues

#57533

…st_checkpoint`

Signed-off-by: Mark Towers <mark@anyscale.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for nested metrics in Result.get_best_checkpoint by using unflattened_lookup. The changes are correct and are accompanied by good tests covering nested metrics, different modes, and backward compatibility. I've suggested one improvement to the error message when an invalid metric is provided, to make it more helpful for users of nested metrics.

Signed-off-by: Mark Towers <mark@anyscale.com>
Comment thread python/ray/air/result.py
Comment thread python/ray/air/result.py
import pyarrow

import ray
from ray._private.dict import unflattened_lookup

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinvyu do we want to support this for Train V2 as well, or should we diverge for Tune?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok to support this in Train. This is only needed if users self-report nested dicts.

Comment thread python/ray/air/result.py
@pseudo-rnd-thoughts pseudo-rnd-thoughts changed the title [rllib, air, train] Add support for nested metrics for Result.get_best_checkpoint Nov 13, 2025
Comment thread python/ray/air/result.py
import pyarrow

import ray
from ray._private.dict import unflattened_lookup

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok to support this in Train. This is only needed if users self-report nested dicts.

Comment thread python/ray/train/v2/tests/test_result.py Outdated
pseudo-rnd-thoughts and others added 3 commits November 17, 2025 20:48
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Mark Towers <mark.m.towers@gmail.com>
Signed-off-by: Mark Towers <mark@anyscale.com>
Comment thread python/ray/air/result.py
@pseudo-rnd-thoughts pseudo-rnd-thoughts added the go add ONLY when ready to merge, run all tests label Nov 19, 2025

@justinvyu justinvyu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@justinvyu justinvyu merged commit 0325fab into ray-project:master Nov 21, 2025
7 checks passed
@justinvyu justinvyu changed the title [rllib, air, train] Add support for nested metrics in Result.get_best_checkpoint Nov 21, 2025
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…ckpoint` (ray-project#58537)

RLlib uses nested metric structure (like
`"{ENV_RUNNER_RESULTS}/{EPISODE_RETURN_MEAN}"`) which
`Result.get_best_checkpoint` doesn't support.
Following `ResultGrid.get_best_result()` to use `unflattened_lookup`,
I've added that to `get_best_checkpoint` along with testing for nested
structures (and its backward compatibility)

---------

Signed-off-by: Mark Towers <mark@anyscale.com>
Signed-off-by: Mark Towers <mark.m.towers@gmail.com>
Co-authored-by: Mark Towers <mark@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-logging This problem is related to logging metrics train-tune

4 participants