[Core] Add warning when uploading large working dirs#45818
Conversation
Signed-off-by: Pavan M <pmannamx@gmail.com>
rclough
left a comment
There was a problem hiding this comment.
Really great job on a quick PR! Overall I think this looks great, I just have some minor nit-picks to the variables and language in the error message. But I am not a ray dev, so hopefully they can chime in as well if my comments are relevant
| ) | ||
|
|
||
| package_size = package_file.stat().st_size | ||
| if package_size > 100 * 1024 * 1024: # 100 MiB |
There was a problem hiding this comment.
Maybe there is a constant somewhere in the ray library that we can reuse that represents the maximum upload? While your comment makes it clear what the "magic numbers" are, it would be clearer if there were a constant. Additionally, if the Ray devs change the maximum upload size in the future, it would make sure that this code remains future-proof.
Unfortunately I'm not a ray dev, so I'm not sure where they set or configure the limit (or if it's even accessible in python) so I'll leave that to ray devs to decide
| package_size = package_file.stat().st_size | ||
| if package_size > 100 * 1024 * 1024: # 100 MiB | ||
| logger.warning( | ||
| f"The package {package_path} is very large. " |
There was a problem hiding this comment.
Most of this text is fine, but in this specific case, I think the language should explicitly reflect that the package is not just large, but larger than the maximum
There was a problem hiding this comment.
Done, I want to highlight that the max unzipped size is 100 MiB, but here, the comparison is with the zipped size, so in cases when the unzipped size is > 100 MiB, but zipped is less than < 100 MiB, we won't see this warning..
By the way, thank you for reviewing :)
Signed-off-by: Pavan M <pmannamx@gmail.com>
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
|
@pmannam this has been stale for a long time, so closing it, but it would still be a good contribution so please re-open if you're interested in picking it back up! |
|
@edoakes Just going through my github notifications - I'm not sure why this issue was ever considered stale. It was pretty much waiting for review the entire time. It went stale because it never got review, otherwise its a pretty straightforward enhancement. I did have one nit review left, but it's unclear if my comment is even applicable |
|
The bot marked it stale because there was no activity on the PR. Happy to shepherd if you or @pmannam want to pick it back up. |
…3404) ## Description Adds an actionable warning when a `runtime_env` package is approaching `GCS_STORAGE_MAX_SIZE`, so users can react before the upload itself fails with a hard `ValueError`. Today, `_zip_files` already warns about *individual* files over 10 MiB (`FILE_SIZE_WARNING`). It does **not** warn about a working_dir built from many small files — most commonly a `.git` directory — so a user can silently zip past `GCS_STORAGE_MAX_SIZE` and only find out when the upload fails. The error message they hit (`Package size (… MiB) exceeds the maximum size of …`) does not tell them where the temp zip lives, so they can't inspect what bloated it. This PR closes that gap: - Adds a `PACKAGE_SIZE_WARNING = GCS_STORAGE_MAX_SIZE // 2` threshold next to the existing `FILE_SIZE_WARNING`. - Adds `_warn_if_package_size_near_limit(...)` which calls `package_path.stat().st_size` (single syscall — no directory walk) and emits a warning naming the local zip path and the most common offenders (`.git`, `.venv`, build artifacts). - Calls it from `create_package` *after* `_zip_files`. That single call site covers **both** entry points: `ray.init(runtime_env=...)` (which goes through `upload_package_if_needed` → `create_package`) and `ray job submit` / dashboard SDK (which goes through `_upload_package` → `create_package`). The hard limit behavior in `_store_package_in_gcs` is unchanged — this is purely an earlier, friendlier warning. ## Related issues Closes #45602 This picks up the work from the now-stale #45818, which a maintainer (@edoakes) explicitly offered to shepherd if a contributor resumed it. The previous attempt only modified the dashboard SDK path; this PR puts the check in `create_package` so the warning also fires for the in-process `ray.init(runtime_env=...)` upload path. ## Additional information ### Test plan New `TestCreatePackageSizeWarning` class in `test_runtime_env_packaging.py`: - `test_warns_when_zip_exceeds_threshold` — builds a directory of many small incompressible files, lowers `PACKAGE_SIZE_WARNING` to 1 byte via monkeypatch, calls `create_package`, asserts the warning was emitted and includes the local package path. - `test_does_not_warn_when_zip_below_threshold` — same fixture but threshold set to 10 MiB, asserts no warning. - `test_warn_handles_missing_file` — calls `_warn_if_package_size_near_limit` on a path that doesn't exist; the helper must not raise (defensive against concurrent cleanup between zip creation and stat). Each test passes its own `logging.Logger` with a list handler, because the default Ray logger doesn't propagate to `caplog`. All three tests pass locally: ``` TestCreatePackageSizeWarning::test_warns_when_zip_exceeds_threshold PASSED TestCreatePackageSizeWarning::test_does_not_warn_when_zip_below_threshold PASSED TestCreatePackageSizeWarning::test_warn_handles_missing_file PASSED ``` `ruff check` and `ruff format --check` are clean on both modified files (the only remaining `ruff format` diff in the repo is on pre-existing code I didn't touch). --------- Signed-off-by: lonexreb <reach2shubhankar@gmail.com>
…y-project#63404) ## Description Adds an actionable warning when a `runtime_env` package is approaching `GCS_STORAGE_MAX_SIZE`, so users can react before the upload itself fails with a hard `ValueError`. Today, `_zip_files` already warns about *individual* files over 10 MiB (`FILE_SIZE_WARNING`). It does **not** warn about a working_dir built from many small files — most commonly a `.git` directory — so a user can silently zip past `GCS_STORAGE_MAX_SIZE` and only find out when the upload fails. The error message they hit (`Package size (… MiB) exceeds the maximum size of …`) does not tell them where the temp zip lives, so they can't inspect what bloated it. This PR closes that gap: - Adds a `PACKAGE_SIZE_WARNING = GCS_STORAGE_MAX_SIZE // 2` threshold next to the existing `FILE_SIZE_WARNING`. - Adds `_warn_if_package_size_near_limit(...)` which calls `package_path.stat().st_size` (single syscall — no directory walk) and emits a warning naming the local zip path and the most common offenders (`.git`, `.venv`, build artifacts). - Calls it from `create_package` *after* `_zip_files`. That single call site covers **both** entry points: `ray.init(runtime_env=...)` (which goes through `upload_package_if_needed` → `create_package`) and `ray job submit` / dashboard SDK (which goes through `_upload_package` → `create_package`). The hard limit behavior in `_store_package_in_gcs` is unchanged — this is purely an earlier, friendlier warning. ## Related issues Closes ray-project#45602 This picks up the work from the now-stale ray-project#45818, which a maintainer (@edoakes) explicitly offered to shepherd if a contributor resumed it. The previous attempt only modified the dashboard SDK path; this PR puts the check in `create_package` so the warning also fires for the in-process `ray.init(runtime_env=...)` upload path. ## Additional information ### Test plan New `TestCreatePackageSizeWarning` class in `test_runtime_env_packaging.py`: - `test_warns_when_zip_exceeds_threshold` — builds a directory of many small incompressible files, lowers `PACKAGE_SIZE_WARNING` to 1 byte via monkeypatch, calls `create_package`, asserts the warning was emitted and includes the local package path. - `test_does_not_warn_when_zip_below_threshold` — same fixture but threshold set to 10 MiB, asserts no warning. - `test_warn_handles_missing_file` — calls `_warn_if_package_size_near_limit` on a path that doesn't exist; the helper must not raise (defensive against concurrent cleanup between zip creation and stat). Each test passes its own `logging.Logger` with a list handler, because the default Ray logger doesn't propagate to `caplog`. All three tests pass locally: ``` TestCreatePackageSizeWarning::test_warns_when_zip_exceeds_threshold PASSED TestCreatePackageSizeWarning::test_does_not_warn_when_zip_below_threshold PASSED TestCreatePackageSizeWarning::test_warn_handles_missing_file PASSED ``` `ruff check` and `ruff format --check` are clean on both modified files (the only remaining `ruff format` diff in the repo is on pre-existing code I didn't touch). --------- Signed-off-by: lonexreb <reach2shubhankar@gmail.com> Signed-off-by: phattruong <23120318@student.hcmus.edu.vn>
Why are these changes needed?
Currently, a warning is logged only when packaging and uploading large files. No warning is logged for large working directories with small files. This change adds a warning just before uploading the zipped working directory, if the zipped size is >100 MiB.
Related issue number
Closes #45602
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.