Skip to content

[train][Docs] Document S3-compatible storage#63103

Merged
matthewdeng merged 1 commit into
ray-project:masterfrom
goanpeca:add-b2-integration
Jun 8, 2026
Merged

[train][Docs] Document S3-compatible storage#63103
matthewdeng merged 1 commit into
ray-project:masterfrom
goanpeca:add-b2-integration

Conversation

@goanpeca

@goanpeca goanpeca commented May 4, 2026

Copy link
Copy Markdown
Contributor

Why are these changes needed?

Ray Train already works with any S3-compatible object store through pyarrow's S3FileSystem (via endpoint_override in the storage_path URI, or the standard AWS_* environment variables). This PR documents that path in the Train persistent-storage guide and adds the Backblaze B2 specifics.

Docs-only, no code changes. (An earlier revision added an env-var aliasing helper; per review feedback it was removed in favor of documenting the setup users perform themselves.)

Changes to doc/source/train/user-guides/persistent-storage.rst:

  • Retitles the section to "S3-compatible storage (Backblaze B2, MinIO, etc.)".
  • Shows the endpoint_override query-parameter form for Backblaze B2 and MinIO (local).
  • Notes that the standard AWS environment variables (AWS_ENDPOINT_URL_S3, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) work with a plain s3://bucket/path.
  • Documents that Backblaze B2 publishes credentials as B2_APPLICATION_KEY_ID / B2_APPLICATION_KEY; users set AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY to those values, since pyarrow reads only the AWS-named variables.
  • Links a complete end-to-end Backblaze B2 notebook example.

Related issue number

Related to #63104

Checks

  • Change is contained to doc/source/train/user-guides/persistent-storage.rst.
  • No code paths changed; existing tests unaffected.
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@goanpeca goanpeca force-pushed the add-b2-integration branch 2 times, most recently from 1546e25 to c117eb5 Compare May 4, 2026 15:48

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Ray Train’s S3-compatible storage experience (notably Backblaze B2) by ensuring credentials provided via Backblaze’s CLI env var names are made visible to pyarrow’s S3 resolver, and by expanding the docs with a B2-focused example.

Changes:

  • Add _alias_s3_compatible_credentials_to_aws_env_vars() and call it from get_fs_and_path() to map B2 env vars onto AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY when appropriate.
  • Add unit tests covering aliasing behavior, no-op behavior, and warnings.
  • Update Train persistent-storage docs with a Backblaze B2 example, endpoint override guidance, and a link to an end-to-end notebook.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
python/ray/train/_internal/storage.py Adds credential env var aliasing logic and invokes it during filesystem resolution.
python/ray/train/tests/test_storage.py Adds tests validating the new env var aliasing behavior and logging.
doc/source/train/user-guides/persistent-storage.rst Updates S3-compatible storage docs to include Backblaze B2 guidance and a runnable example link.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/ray/train/_internal/storage.py Outdated
Comment thread python/ray/train/_internal/storage.py Outdated
Comment thread doc/source/train/user-guides/persistent-storage.rst Outdated
Comment thread python/ray/train/_internal/storage.py Outdated
Comment thread python/ray/train/_internal/storage.py Outdated
@goanpeca goanpeca force-pushed the add-b2-integration branch from c117eb5 to 74a69f2 Compare May 4, 2026 16:36
Comment thread python/ray/train/_internal/storage.py Outdated
@goanpeca goanpeca force-pushed the add-b2-integration branch 2 times, most recently from 96cf884 to 8e00f5b Compare May 4, 2026 16:55
@ray-gardener ray-gardener Bot added docs An issue or change related to documentation train Ray Train Related Issue community-contribution Contributed by the community labels May 4, 2026
@goanpeca goanpeca force-pushed the add-b2-integration branch from 8e00f5b to 5c29b68 Compare May 14, 2026 23:41
@github-actions

Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions Bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label May 29, 2026

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @goanpeca, I've used claude to improve the documentation so don't worry about where that random commit came from. I just have a single question about the implementation

Comment thread python/ray/train/_internal/storage.py Outdated
@@ -294,6 +297,35 @@ def _create_directory(fs: pyarrow.fs.FileSystem, fs_path: str) -> None:
)


def _alias_s3_compatible_credentials_to_aws_env_vars() -> None:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide more information about why this function is necessary? Shouldn't this be done on the user side rather than behind the scenes

@goanpeca goanpeca Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pseudo-rnd-thoughts!

Good question! It is not strictly necessary: the functional path is just AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY plus an endpoint override, which the docs now cover.

The helper is really just a convenience for people already on Backblaze B2. B2's docs and CLI use B2_APPLICATION_KEY_ID / B2_APPLICATION_KEY, so if a user has those exported and points Ray at an s3:// path, pyarrow silently ignores them (it only reads the AWS_ names) and they hit a confusing auth error. The alias saves them re-exporting the same secret, and it is structured so other providers can be added later.

That said, I am happy to drop it and keep this docs-only if you would rather credentials stay explicit on the user side. Just let me know! 😄

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the answer @goanpeca, yes, could we remove the function from the code and add a note to the documentation for the changes that users will need to implement for access

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@github-actions github-actions Bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels May 30, 2026
@goanpeca goanpeca changed the title [train][Docs] Add Backblaze B2 example + alias B2_APPLICATION_KEY env vars onto AWS_* Jun 2, 2026
@goanpeca goanpeca force-pushed the add-b2-integration branch from ed9c5f6 to 5c29b68 Compare June 2, 2026 14:31

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 5c29b68791fd328f75db0ae016a4063ccffa38cd. Configure here.

Comment thread python/ray/train/_internal/storage.py Outdated
@goanpeca goanpeca force-pushed the add-b2-integration branch from 5c29b68 to 6b3049f Compare June 2, 2026 14:43
@goanpeca goanpeca changed the title [train][Docs] Document S3-compatible storage (MinIO, Backblaze B2) Jun 2, 2026
@goanpeca goanpeca force-pushed the add-b2-integration branch from 6442fa2 to bb3f8de Compare June 2, 2026 14:53
@goanpeca goanpeca requested a review from Copilot June 2, 2026 14:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread doc/source/train/user-guides/persistent-storage.rst
Comment thread doc/source/train/user-guides/persistent-storage.rst Outdated
@goanpeca goanpeca force-pushed the add-b2-integration branch from bb3f8de to c8e4c18 Compare June 2, 2026 14:56
@goanpeca goanpeca changed the title [train][Docs] Document S3-compatible storage (Backblaze B2, MinIO) Jun 2, 2026
@goanpeca goanpeca force-pushed the add-b2-integration branch from c8e4c18 to bd029db Compare June 2, 2026 15:00
@goanpeca goanpeca requested a review from Copilot June 2, 2026 15:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread doc/source/train/user-guides/persistent-storage.rst Outdated
Comment thread doc/source/train/user-guides/persistent-storage.rst Outdated
@goanpeca goanpeca force-pushed the add-b2-integration branch from bd029db to 8ddc19b Compare June 2, 2026 15:24
@goanpeca goanpeca requested a review from Copilot June 2, 2026 15:24

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread doc/source/train/user-guides/persistent-storage.rst Outdated
@goanpeca goanpeca force-pushed the add-b2-integration branch from 8ddc19b to a0524bd Compare June 2, 2026 15:41
@goanpeca goanpeca requested a review from Copilot June 2, 2026 15:41

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

Signed-off-by: Gonzalo Peña-Castellanos <goanpeca@gmail.com>
@goanpeca goanpeca force-pushed the add-b2-integration branch from a0524bd to 98083b8 Compare June 2, 2026 15:48

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes

@pseudo-rnd-thoughts pseudo-rnd-thoughts added the go add ONLY when ready to merge, run all tests label Jun 4, 2026
@matthewdeng matthewdeng merged commit a2f222d into ray-project:master Jun 8, 2026
8 checks passed
@goanpeca goanpeca deleted the add-b2-integration branch June 9, 2026 03:04
sampan-s-nayak pushed a commit to sampan-s-nayak/ray that referenced this pull request Jun 10, 2026
## Why are these changes needed?

Ray Train already works with any S3-compatible object store through
pyarrow's `S3FileSystem` (via `endpoint_override` in the `storage_path`
URI, or the standard `AWS_*` environment variables). This PR documents
that path in the Train persistent-storage guide and adds the Backblaze
B2 specifics.

**Docs-only, no code changes.** (An earlier revision added an env-var
aliasing helper; per review feedback it was removed in favor of
documenting the setup users perform themselves.)

Changes to `doc/source/train/user-guides/persistent-storage.rst`:

- Retitles the section to "S3-compatible storage (Backblaze B2, MinIO,
etc.)".
- Shows the `endpoint_override` query-parameter form for Backblaze B2
and MinIO (local).
- Notes that the standard AWS environment variables
(`AWS_ENDPOINT_URL_S3`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`)
work with a plain `s3://bucket/path`.
- Documents that Backblaze B2 publishes credentials as
`B2_APPLICATION_KEY_ID` / `B2_APPLICATION_KEY`; users set
`AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` to those values, since
pyarrow reads only the AWS-named variables.
- Links a complete end-to-end Backblaze B2 notebook example.

## Related issue number

Related to ray-project#63104

## Checks

- [x] Change is contained to
`doc/source/train/user-guides/persistent-storage.rst`.
- [x] No code paths changed; existing tests unaffected.

Signed-off-by: Gonzalo Peña-Castellanos <goanpeca@gmail.com>
limarkdcunha pushed a commit to limarkdcunha/ray that referenced this pull request Jun 30, 2026
## Why are these changes needed?

Ray Train already works with any S3-compatible object store through
pyarrow's `S3FileSystem` (via `endpoint_override` in the `storage_path`
URI, or the standard `AWS_*` environment variables). This PR documents
that path in the Train persistent-storage guide and adds the Backblaze
B2 specifics.

**Docs-only, no code changes.** (An earlier revision added an env-var
aliasing helper; per review feedback it was removed in favor of
documenting the setup users perform themselves.)

Changes to `doc/source/train/user-guides/persistent-storage.rst`:

- Retitles the section to "S3-compatible storage (Backblaze B2, MinIO,
etc.)".
- Shows the `endpoint_override` query-parameter form for Backblaze B2
and MinIO (local).
- Notes that the standard AWS environment variables
(`AWS_ENDPOINT_URL_S3`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`)
work with a plain `s3://bucket/path`.
- Documents that Backblaze B2 publishes credentials as
`B2_APPLICATION_KEY_ID` / `B2_APPLICATION_KEY`; users set
`AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` to those values, since
pyarrow reads only the AWS-named variables.
- Links a complete end-to-end Backblaze B2 notebook example.

## Related issue number

Related to ray-project#63104

## Checks

- [x] Change is contained to
`doc/source/train/user-guides/persistent-storage.rst`.
- [x] No code paths changed; existing tests unaffected.

Signed-off-by: Gonzalo Peña-Castellanos <goanpeca@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community docs An issue or change related to documentation go add ONLY when ready to merge, run all tests train Ray Train Related Issue unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

4 participants