Skip to content

[Docs] Anonymize personal paths in Tune example notebook outputs#63464

Merged
matthewdeng merged 4 commits into
ray-project:masterfrom
dstrodtman:doc-1054-tune-notebook-path-cleanup
Jun 2, 2026
Merged

[Docs] Anonymize personal paths in Tune example notebook outputs#63464
matthewdeng merged 4 commits into
ray-project:masterfrom
dstrodtman:doc-1054-tune-notebook-path-cleanup

Conversation

@dstrodtman

@dstrodtman dstrodtman commented May 18, 2026

Copy link
Copy Markdown
Contributor

Update (post-review)

Three changes since the original commit, in response to review feedback:

  1. /home/ray restored (not a leak). /home/ray is Ray's runtime home directory in containers and clusters, not a personal path. The first commit wrongly anonymized it to ~; reverted across batch_tuning, pbt_guide, and tune-pytorch-lightning (26 paths).
  2. christy-airray_results. That experiment dir in batch_tuning encodes a person's name plus the deprecated AIR runtime tag, so it stays anonymized — to ray_results (Tune's default storage dir), across 18 checkpoint paths.
  3. pbt_visualization.ipynb folded in. Adjacent file with 90 /Users/rdecal/ray_results leaks in output cells, anonymized to ~/ray_results. Brings the total to 10 notebooks.

The /Users/<name> leaks (kai, rdecal) remain anonymized to ~. A companion agent rule capturing the /home/ray guidance is in #63646.

The original description below predates this update; its "substitute ~ for /home/ray" method note no longer applies.


Description

Cleans up personal-path leaks (/Users/<name>/..., /home/ray/...) in output cells of nine Tune example notebooks under doc/source/tune/examples/. 127 leaks removed across 9 files; cell sources untouched.

Surfaced by the DOC-991 (#36167) resolving agent — flagged as adjacent rot during the pbt_transformers.ipynb / lightgbm_example.ipynb structural fix.

Related issues

[DOC-1054]

Additional information

Method: a one-shot Python script anonymized the leaks (substitute ~ for /Users/<name> and /home/ray in output-cell text and HTML, preserving per-file JSON indentation 1/2/4-space). Diff is 126±/126± lines across the 9 files, proportional to the original leak count.

The 9 affected notebooks:

  • ax_example.ipynb (orthogonal to DOC-1019 ax-platform 1.0.0 API change)
  • bayesopt_example.ipynb (orthogonal to DOC-77 numpy.float deprecation)
  • bohb_example.ipynb
  • nevergrad_example.ipynb
  • tune-xgboost.ipynb
  • batch_tuning.ipynb
  • pbt_guide.ipynb
  • tune-pytorch-lightning.ipynb
  • tune_mnist_keras.ipynb

Long-term: leaks will recur until the notebook test/refresh pipeline strips outputs or anonymizes paths before commit. Out of scope for this PR — see DOC-907 for the broader notebook-test-coverage work.

🤖 Generated with Claude Code

Removes 127 personal-path leaks (`/Users/<name>/...`, `/home/ray/...`)
from output cells across nine Tune example notebooks under
`doc/source/tune/examples/`. Cell sources are untouched. Surfaced
by the DOC-991 (ray-project#36167) resolving agent as adjacent
rot during the pbt_transformers / lightgbm structural fix.

Long-term, leaks will recur until the notebook test/refresh
pipeline strips outputs before commit (see DOC-907 for the broader
notebook-test-coverage work).

[DOC-1054]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
@dstrodtman dstrodtman requested a review from a team as a code owner May 18, 2026 18:43
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

"output_type": "stream",
"text": [
"/home/ray/anaconda3/lib/python3.8/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n",
"~/anaconda3/lib/python3.8/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/home/ray is a real path for Ray, not a user dir. Need to commit this as an agents/claude rule for the repo to avoid this error in the future.

" <td>500.005318</td>\n",
" <td>LinearRegression()</td>\n",
" <td>Checkpoint(local_path=/home/ray/christy-air/fo...</td>\n",
" <td>Checkpoint(local_path=~/christy-air/fo...</td>\n",

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However christy-air likely does encode a person's name. In addition, the -air tag is likely indicative of the AI runtime which is a deprecated concept and so should be removed from docs.

@ray-gardener ray-gardener Bot added tune Tune-related issues docs An issue or change related to documentation labels May 18, 2026
Addresses review feedback on ray-project#63464:

- /home/ray is Ray's runtime home directory, not a personal-path leak.
  Restore the 26 paths that were wrongly anonymized to ~ across
  batch_tuning, pbt_guide, and tune-pytorch-lightning.
- christy-air encodes a person's name plus the deprecated AIR runtime
  tag. Anonymize the 18 checkpoint paths in batch_tuning to
  /home/ray/ray_results (Tune's default storage directory).

The /Users/<name> leaks (kai, rdecal) remain anonymized to ~.

[DOC-1054]

Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
elliot-barn pushed a commit that referenced this pull request May 26, 2026
## Description

Adds a documentation-authoring rule to `doc/.claude/CLAUDE.md`:
`/home/ray` is Ray's runtime home directory in containers and clusters,
not a personal-path leak. Notebook-output anonymization passes must not
rewrite it. Anonymize only real user identifiers (`/Users/<name>`,
`/home/<person>`) and experiment or output dirs that encode a person or
the deprecated AIR runtime.

## Related issues

Surfaced during review of #63464, where an anonymization pass over Tune
example notebook outputs incorrectly rewrote `/home/ray` to `~`.
[DOC-1054]

Long-term recurrence prevention (stripping or anonymizing notebook
outputs in the test/refresh pipeline) is tracked in DOC-907.

## Additional information

The companion content fix — restoring the wrongly-anonymized `/home/ray`
paths and anonymizing the leaked `christy-air` experiment dir to
`ray_results` — is in #63464.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
Folds pbt_visualization.ipynb into the path-cleanup scope, raised during
review of ray-project#63464. 90 /Users/rdecal/ray_results paths in output cells
become ~/ray_results. Source cells untouched.

[DOC-1054]

Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
dstrodtman added a commit to dstrodtman/ray that referenced this pull request May 27, 2026
The anonymized Tune log path in lightgbm_example.ipynb output used
/Users/user/ray_results. Switch to ~/ray_results to match the
convention in ray-project#63464 and codified in ray-project#63646 (personal home prefixes
become ~).

[DOC-991]

Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
@matthewdeng matthewdeng enabled auto-merge (squash) June 2, 2026 20:14
@github-actions github-actions Bot added the go add ONLY when ready to merge, run all tests label Jun 2, 2026
@matthewdeng matthewdeng merged commit 4ab1d93 into ray-project:master Jun 2, 2026
8 checks passed
rueian pushed a commit to rueian/ray that referenced this pull request Jun 4, 2026
…-project#63464)

## Update (post-review)

Three changes since the original commit, in response to review feedback:

1. **`/home/ray` restored (not a leak).** `/home/ray` is Ray's runtime
home directory in containers and clusters, not a personal path. The
first commit wrongly anonymized it to `~`; reverted across
`batch_tuning`, `pbt_guide`, and `tune-pytorch-lightning` (26 paths).
2. **`christy-air` → `ray_results`.** That experiment dir in
`batch_tuning` encodes a person's name plus the deprecated AIR runtime
tag, so it stays anonymized — to `ray_results` (Tune's default storage
dir), across 18 checkpoint paths.
3. **`pbt_visualization.ipynb` folded in.** Adjacent file with 90
`/Users/rdecal/ray_results` leaks in output cells, anonymized to
`~/ray_results`. Brings the total to 10 notebooks.

The `/Users/<name>` leaks (kai, rdecal) remain anonymized to `~`. A
companion agent rule capturing the `/home/ray` guidance is in ray-project#63646.

The original description below predates this update; its "substitute `~`
for `/home/ray`" method note no longer applies.

---

## Description

Cleans up personal-path leaks (`/Users/<name>/...`, `/home/ray/...`) in
**output cells** of nine Tune example notebooks under
`doc/source/tune/examples/`. 127 leaks removed across 9 files; cell
sources untouched.

Surfaced by the
[DOC-991](https://anyscale1.atlassian.net/browse/DOC-991)
(ray-project#36167) resolving agent — flagged as adjacent rot during
the `pbt_transformers.ipynb` / `lightgbm_example.ipynb` structural fix.

## Related issues

[DOC-1054]

## Additional information

Method: a one-shot Python script anonymized the leaks (substitute `~`
for `/Users/<name>` and `/home/ray` in output-cell text and HTML,
preserving per-file JSON indentation 1/2/4-space). Diff is 126±/126±
lines across the 9 files, proportional to the original leak count.

The 9 affected notebooks:
- `ax_example.ipynb` (orthogonal to DOC-1019 ax-platform 1.0.0 API
change)
- `bayesopt_example.ipynb` (orthogonal to DOC-77 numpy.float
deprecation)
- `bohb_example.ipynb`
- `nevergrad_example.ipynb`
- `tune-xgboost.ipynb`
- `batch_tuning.ipynb`
- `pbt_guide.ipynb`
- `tune-pytorch-lightning.ipynb`
- `tune_mnist_keras.ipynb`

Long-term: leaks will recur until the notebook test/refresh pipeline
strips outputs or anonymizes paths before commit. Out of scope for this
PR — see DOC-907 for the broader notebook-test-coverage work.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
limarkdcunha pushed a commit to limarkdcunha/ray that referenced this pull request Jun 30, 2026
…3646)

## Description

Adds a documentation-authoring rule to `doc/.claude/CLAUDE.md`:
`/home/ray` is Ray's runtime home directory in containers and clusters,
not a personal-path leak. Notebook-output anonymization passes must not
rewrite it. Anonymize only real user identifiers (`/Users/<name>`,
`/home/<person>`) and experiment or output dirs that encode a person or
the deprecated AIR runtime.

## Related issues

Surfaced during review of ray-project#63464, where an anonymization pass over Tune
example notebook outputs incorrectly rewrote `/home/ray` to `~`.
[DOC-1054]

Long-term recurrence prevention (stripping or anonymizing notebook
outputs in the test/refresh pipeline) is tracked in DOC-907.

## Additional information

The companion content fix — restoring the wrongly-anonymized `/home/ray`
paths and anonymizing the leaked `christy-air` experiment dir to
`ray_results` — is in ray-project#63464.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
limarkdcunha pushed a commit to limarkdcunha/ray that referenced this pull request Jun 30, 2026
…-project#63464)

## Update (post-review)

Three changes since the original commit, in response to review feedback:

1. **`/home/ray` restored (not a leak).** `/home/ray` is Ray's runtime
home directory in containers and clusters, not a personal path. The
first commit wrongly anonymized it to `~`; reverted across
`batch_tuning`, `pbt_guide`, and `tune-pytorch-lightning` (26 paths).
2. **`christy-air` → `ray_results`.** That experiment dir in
`batch_tuning` encodes a person's name plus the deprecated AIR runtime
tag, so it stays anonymized — to `ray_results` (Tune's default storage
dir), across 18 checkpoint paths.
3. **`pbt_visualization.ipynb` folded in.** Adjacent file with 90
`/Users/rdecal/ray_results` leaks in output cells, anonymized to
`~/ray_results`. Brings the total to 10 notebooks.

The `/Users/<name>` leaks (kai, rdecal) remain anonymized to `~`. A
companion agent rule capturing the `/home/ray` guidance is in ray-project#63646.

The original description below predates this update; its "substitute `~`
for `/home/ray`" method note no longer applies.

---

## Description

Cleans up personal-path leaks (`/Users/<name>/...`, `/home/ray/...`) in
**output cells** of nine Tune example notebooks under
`doc/source/tune/examples/`. 127 leaks removed across 9 files; cell
sources untouched.

Surfaced by the
[DOC-991](https://anyscale1.atlassian.net/browse/DOC-991)
(ray-project#36167) resolving agent — flagged as adjacent rot during
the `pbt_transformers.ipynb` / `lightgbm_example.ipynb` structural fix.

## Related issues

[DOC-1054]

## Additional information

Method: a one-shot Python script anonymized the leaks (substitute `~`
for `/Users/<name>` and `/home/ray` in output-cell text and HTML,
preserving per-file JSON indentation 1/2/4-space). Diff is 126±/126±
lines across the 9 files, proportional to the original leak count.

The 9 affected notebooks:
- `ax_example.ipynb` (orthogonal to DOC-1019 ax-platform 1.0.0 API
change)
- `bayesopt_example.ipynb` (orthogonal to DOC-77 numpy.float
deprecation)
- `bohb_example.ipynb`
- `nevergrad_example.ipynb`
- `tune-xgboost.ipynb`
- `batch_tuning.ipynb`
- `pbt_guide.ipynb`
- `tune-pytorch-lightning.ipynb`
- `tune_mnist_keras.ipynb`

Long-term: leaks will recur until the notebook test/refresh pipeline
strips outputs or anonymizes paths before commit. Out of scope for this
PR — see DOC-907 for the broader notebook-test-coverage work.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs An issue or change related to documentation go add ONLY when ready to merge, run all tests tune Tune-related issues

2 participants