Skip to content

Bump urllib3 from 2.5.0 to 2.6.0 in /clients/python-client#4260

Merged
rueian merged 1 commit into
masterfrom
dependabot/pip/clients/python-client/urllib3-2.6.0
Dec 7, 2025
Merged

Bump urllib3 from 2.5.0 to 2.6.0 in /clients/python-client#4260
rueian merged 1 commit into
masterfrom
dependabot/pip/clients/python-client/urllib3-2.6.0

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github Dec 6, 2025

Copy link
Copy Markdown
Contributor

Bumps urllib3 from 2.5.0 to 2.6.0.

Release notes

Sourced from urllib3's releases.

2.6.0

🚀 urllib3 is fundraising for HTTP/2 support

urllib3 is raising ~$40,000 USD to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects please consider contributing financially to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.

Thank you for your support.

Security

  • Fixed a security issue where streaming API could improperly handle highly compressed HTTP content ("decompression bombs") leading to excessive resource consumption even when a small amount of data was requested. Reading small chunks of compressed data is safer and much more efficient now. (CVE-2025-66471 reported by @​Cycloctane, 8.9 High, GHSA-2xpw-w6gg-jr37)
  • Fixed a security issue where an attacker could compose an HTTP response with virtually unlimited links in the Content-Encoding header, potentially leading to a denial of service (DoS) attack by exhausting system resources during decoding. The number of allowed chained encodings is now limited to 5. (CVE-2025-66418 reported by @​illia-v, 8.9 High, GHSA-gm62-xv2j-4w53)

[!IMPORTANT]

  • If urllib3 is not installed with the optional urllib3[brotli] extra, but your environment contains a Brotli/brotlicffi/brotlipy package anyway, make sure to upgrade it to at least Brotli 1.2.0 or brotlicffi 1.2.0.0 to benefit from the security fixes and avoid warnings. Prefer using urllib3[brotli] to install a compatible Brotli package automatically.
  • If you use custom decompressors, please make sure to update them to respect the changed API of urllib3.response.ContentDecoder.

Features

  • Enabled retrieval, deletion, and membership testing in HTTPHeaderDict using bytes keys. (#3653)
  • Added host and port information to string representations of HTTPConnection. (#3666)
  • Added support for Python 3.14 free-threading builds explicitly. (#3696)

Removals

  • Removed the HTTPResponse.getheaders() method in favor of HTTPResponse.headers. Removed the HTTPResponse.getheader(name, default) method in favor of HTTPResponse.headers.get(name, default). (#3622)

Bugfixes

  • Fixed redirect handling in urllib3.PoolManager when an integer is passed for the retries parameter. (#3649)
  • Fixed HTTPConnectionPool when used in Emscripten with no explicit port. (#3664)
  • Fixed handling of SSLKEYLOGFILE with expandable variables. (#3700)

Misc

  • Changed the zstd extra to install backports.zstd instead of zstandard on Python 3.13 and before. (#3693)
  • Improved the performance of content decoding by optimizing BytesQueueBuffer class. (#3710)
  • Allowed building the urllib3 package with newer setuptools-scm v9.x. (#3652)
  • Ensured successful urllib3 builds by setting Hatchling requirement to ≥ 1.27.0. (#3638)
Changelog

Sourced from urllib3's changelog.

2.6.0 (2025-12-05)

Security

  • Fixed a security issue where streaming API could improperly handle highly compressed HTTP content ("decompression bombs") leading to excessive resource consumption even when a small amount of data was requested. Reading small chunks of compressed data is safer and much more efficient now. (GHSA-2xpw-w6gg-jr37 <https://github.com/urllib3/urllib3/security/advisories/GHSA-2xpw-w6gg-jr37>__)
  • Fixed a security issue where an attacker could compose an HTTP response with virtually unlimited links in the Content-Encoding header, potentially leading to a denial of service (DoS) attack by exhausting system resources during decoding. The number of allowed chained encodings is now limited to 5. (GHSA-gm62-xv2j-4w53 <https://github.com/urllib3/urllib3/security/advisories/GHSA-gm62-xv2j-4w53>__)

.. caution::

  • If urllib3 is not installed with the optional urllib3[brotli] extra, but your environment contains a Brotli/brotlicffi/brotlipy package anyway, make sure to upgrade it to at least Brotli 1.2.0 or brotlicffi 1.2.0.0 to benefit from the security fixes and avoid warnings. Prefer using urllib3[brotli] to install a compatible Brotli package automatically.

  • If you use custom decompressors, please make sure to update them to respect the changed API of urllib3.response.ContentDecoder.

Features

  • Enabled retrieval, deletion, and membership testing in HTTPHeaderDict using bytes keys. ([#3653](https://github.com/urllib3/urllib3/issues/3653) <https://github.com/urllib3/urllib3/issues/3653>__)
  • Added host and port information to string representations of HTTPConnection. ([#3666](https://github.com/urllib3/urllib3/issues/3666) <https://github.com/urllib3/urllib3/issues/3666>__)
  • Added support for Python 3.14 free-threading builds explicitly. ([#3696](https://github.com/urllib3/urllib3/issues/3696) <https://github.com/urllib3/urllib3/issues/3696>__)

Removals

  • Removed the HTTPResponse.getheaders() method in favor of HTTPResponse.headers. Removed the HTTPResponse.getheader(name, default) method in favor of HTTPResponse.headers.get(name, default). ([#3622](https://github.com/urllib3/urllib3/issues/3622) <https://github.com/urllib3/urllib3/issues/3622>__)

Bugfixes

  • Fixed redirect handling in urllib3.PoolManager when an integer is passed for the retries parameter. ([#3649](https://github.com/urllib3/urllib3/issues/3649) <https://github.com/urllib3/urllib3/issues/3649>__)
  • Fixed HTTPConnectionPool when used in Emscripten with no explicit port. ([#3664](https://github.com/urllib3/urllib3/issues/3664) <https://github.com/urllib3/urllib3/issues/3664>__)
  • Fixed handling of SSLKEYLOGFILE with expandable variables. ([#3700](https://github.com/urllib3/urllib3/issues/3700) <https://github.com/urllib3/urllib3/issues/3700>__)

... (truncated)

Commits
  • 720f484 Release 2.6.0
  • 24d7b67 Merge commit from fork
  • c19571d Merge commit from fork
  • 816fcf0 Bump actions/setup-python from 6.0.0 to 6.1.0 (#3725)
  • 18af0a1 Improve speed of BytesQueueBuffer.get() by using memoryview (#3711)
  • 1f6abac Bump versions of pre-commit hooks (#3716)
  • 1c8fbf7 Bump actions/checkout from 5.0.0 to 6.0.0 (#3722)
  • 7784b9e Add Python 3.15 to CI (#3717)
  • 0241c9e Updated docs to reflect change in optional zstd dependency from zstandard t...
  • 7afcabb Expand environment variable of SSLKEYLOGFILE (#3705)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    You can disable automated security fix PRs for this repo from the Security Alerts page.
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](urllib3/urllib3@2.5.0...2.6.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Dec 6, 2025

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Future-Outlier Future-Outlier left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @rueian to merge

@rueian rueian merged commit 8d04a60 into master Dec 7, 2025
27 checks passed
@dependabot dependabot Bot deleted the dependabot/pip/clients/python-client/urllib3-2.6.0 branch December 7, 2025 01:31
win5923 pushed a commit to win5923/kuberay that referenced this pull request Dec 17, 2025
…ct#4260)

Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](urllib3/urllib3@2.5.0...2.6.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
pipo02mix added a commit to giantswarm/kuberay that referenced this pull request May 19, 2026
* [APIServer][Docs] Add user guide for retry behavior & configuration (#4144)

* [Docs] Add the draft description about feature intro, configurations, and usecases

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Fix] Update the retry walk-through

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Doc] rewrite the first 2 sections

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Doc] Revise documentation wording and add Observing Retry Behavior section

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Fix] fix linting issue by running pre-commit run berfore commiting

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Fix] fix linting errors in the Markdown linting

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Fix] Clean up the math equation

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* Update the math formula of Backoff calculation.

Co-authored-by: Nary Yeh <60069744+machichima@users.noreply.github.com>
Signed-off-by: JustinYeh <justinyeh1995@gmail.com>

* [Fix] Explicitly mentioned exponential backoff and removed the customization parts

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Docs] Clarify naming by replacing “APIServer” with “KubeRay APIServer”

Co-authored-by: Cheng-Yeh Chung <kenchung285@gmail.com>
Signed-off-by: JustinYeh <justinyeh1995@gmail.com>

* [Docs] Rename retry-configuration.md to retry-behavior.md for accuracy

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* Update Title to KubeRay APIServer Retry Behavior

Co-authored-by: Cheng-Yeh Chung <kenchung285@gmail.com>
Signed-off-by: JustinYeh <justinyeh1995@gmail.com>

* [Docs] Add a note about the limitation of retry configuration

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

---------

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
Signed-off-by: JustinYeh <justinyeh1995@gmail.com>
Co-authored-by: Nary Yeh <60069744+machichima@users.noreply.github.com>
Co-authored-by: Cheng-Yeh Chung <kenchung285@gmail.com>

* Support X-Ray-Authorization fallback header for accepting auth token via proxy (#4213)

* Support X-Ray-Authorization fallback header for accepting auth token in dashboard

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* remove todo comment

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* [RayCluster] make auth token secret name consistency (#4216)

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayCluster] Status includes head containter status message (#4196)

* [RayCluster] Status includes head containter status message

Signed-off-by: Spencer Peterson <spencerjp@google.com>

* lint

Signed-off-by: Spencer Peterson <spencerjp@google.com>

* [RayCluster] Containers not ready status reflects structured reason

Signed-off-by: Spencer Peterson <spencerjp@google.com>

* nit

Signed-off-by: Spencer Peterson <spencerjp@google.com>

---------

Signed-off-by: Spencer Peterson <spencerjp@google.com>

* Remove erroneous  call in applyServeTargetCapacity (#4212)

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

* [RayJob] Add token authentication support for light weight job submitter (#4215)

* [RayJob] light weight job submitter auth token support

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* X-Ray-Authorization

Signed-off-by: Rueian <rueiancsie@gmail.com>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>

* feat: kubectl ray get token command (#4218)

* feat: kubectl ray get token command

Signed-off-by: Rueian <rueiancsie@gmail.com>

* Update kubectl-plugin/pkg/cmd/get/get_token_test.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>

* Update kubectl-plugin/pkg/cmd/get/get_token.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>

* make sure the raycluster exists before getting the secret

Signed-off-by: Rueian <rueiancsie@gmail.com>

* better ux

Signed-off-by: Rueian <rueiancsie@gmail.com>

* Update kubectl-plugin/pkg/cmd/get/get_token.go

Co-authored-by: Han-Ju Chen (Future-Outlier) <eric901201@gmail.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>

---------

Signed-off-by: Rueian <rueiancsie@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Han-Ju Chen (Future-Outlier) <eric901201@gmail.com>

* feat: upgrade to Ray 2.52.0 to support token auth mode (#4152)

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Trigger CI

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* andrew's comment

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Revert ray ml image

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* [Chore] Remove unused variable in volcano scheduler (#4223)

Signed-off-by: wei-chenglai <qazwsx0939059006@gmail.com>

* [e2e] RayJob Auth Mode E2E (#4229)

* [E2E] RayJob Auth Mode E2E

Signed-off-by: seanlaii <qazwsx0939059006@gmail.com>

* refactor

* refactor

---------

Signed-off-by: seanlaii <qazwsx0939059006@gmail.com>

* Update README with additional resource links (#4230)

* Update README with additional resource links

Added links for KubeRay APIServer and Dashboard for more details.

Signed-off-by: Jun-Hao Wan <ken89@kimo.com>

* Update README.md

Signed-off-by: Jun-Hao Wan <ken89@kimo.com>

---------

Signed-off-by: Jun-Hao Wan <ken89@kimo.com>

* introduce historyserver directory and project structure (#4232)

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>

* [RayJob] light weight job submitter upgrade to 1.5.1 to support auth token mode (#4235)

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Add example in GKE to enable Ray resource isolation using cgroupsv2 and writable cgroup containers (#4236)

* Add example in GKE to enable Ray resource isolation using cgroupsv2 and writable cgroup containers

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>

* use lower resource requests

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>

---------

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>

* add sample that uses --system-reserved-cpu and --system-reserved-memory (#4237)

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>

* [e2e] Enhance RayCluster Auth E2E (#4231)

* [e2e] Enhance RayCluster Auth E2E

Signed-off-by: seanlaii <qazwsx0939059006@gmail.com>

* fix test

* fix test

* fix test

* fix test

* fix test

---------

Signed-off-by: seanlaii <qazwsx0939059006@gmail.com>

* enhancement: Update docker base image. (#4193)

* [RayService] Directly fail CR if is invalid (#4228)

* [RayService] Directly fail CR if is invalid

Signed-off-by: win5923 <ken89@kimo.com>

* nit: set the name with strings.Repeat(a, 48)

Signed-off-by: win5923 <ken89@kimo.com>

---------

Signed-off-by: win5923 <ken89@kimo.com>

* [Chore] Upgrade operator version in test-sample-yamls (#4248)

Signed-off-by: wei-chenglai <qazwsx0939059006@gmail.com>

* feat: add allow method to api server when allow cors (#4259)

Signed-off-by: Cheyu Wu <cheyu1220@gmail.com>

* Bump urllib3 from 2.5.0 to 2.6.0 in /clients/python-client (#4260)

Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump next from 15.2.4 to 15.4.8 in /dashboard (#4254)

* Bump next from 15.2.4 to 15.4.8 in /dashboard

Bumps [next](https://github.com/vercel/next.js) from 15.2.4 to 15.4.8.
- [Release notes](https://github.com/vercel/next.js/releases)
- [Changelog](https://github.com/vercel/next.js/blob/canary/release.js)
- [Commits](https://github.com/vercel/next.js/compare/v15.2.4...v15.4.8)

---
updated-dependencies:
- dependency-name: next
  dependency-version: 15.4.8
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix dep issue

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Future-Outlier <eric901201@gmail.com>

* [Docs] Upgrade kind base image to v1.26.0 (#4252)

* docs: Upgrade kind base image to 1.26 for dev

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* Trigger CI

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Future-Outlier <eric901201@gmail.com>

* chore: remove unuse file (#4247)

Signed-off-by: Cheyu Wu <cheyu1220@gmail.com>

* feat: Add runtimeClassName support for head and worker Pods (#4184)

* feat: Add runtimeClassName support for head and worker Pods

* fix: pre-commit linting errors

* chore: Update values.yaml

* [RayService] Migrate from Endpoints API to EndpointSlice API for RayService (#4245)

* Migrate from Endpoints API to EndpointSlice API for RayService

Signed-off-by: seanlaii <qazwsx0939059006@gmail.com>

* trigger test

* add back endpoints rule for backward compatibility

* add comment

* fix comment

* de-duplicate endpoint based on pod uid

* address comment

* change TODO message

* trigger test

* remove endpoint RBAC

* move comment

* change logging level

---------

Signed-off-by: seanlaii <qazwsx0939059006@gmail.com>

* fix: hardening kuberay operator security context (#4243)

Signed-off-by: lilylinh <lhacaoth@redhat.com>

* [CI] Upgrade operator version from v1.4.2 to v1.5.1 (#4261)

* chore: Bump operator ver to v1.5.1

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: Modify prev version to v1.4.2

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* Apply suggestions from code review

Signed-off-by: Rueian <rueiancsie@gmail.com>

---------

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>

* [RayService] auth token mode e2e test (#4225)

* add ray service auth test

Signed-off-by: Ryan <ryan980053@gmail.com>

* try to fix the error

Signed-off-by: Ryan <ryan980053@gmail.com>

* add worker group

Signed-off-by: Ryan <ryan980053@gmail.com>

* Adjust resource requests and limits in tests

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

* Simplify RayService auth test by removing worker group

Removed worker group spec and related verification for auth token propagation in RayService tests.

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

* Update rayservice_auth_test.go

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

* Refactor TestRayServiceAuthToken for clarity

Refactor test for RayService authentication to improve clarity and maintainability.

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

* Update ray-operator/test/e2erayservice/rayservice_auth_test.go

Co-authored-by: Han-Ju Chen (Future-Outlier) <eric901201@gmail.com>
Signed-off-by: Ryan Huang <ryankert01@gmail.com>

* Update ray-operator/test/e2erayservice/rayservice_auth_test.go

Co-authored-by: Jun-Hao Wan <ken89@kimo.com>
Signed-off-by: Ryan Huang <ryankert01@gmail.com>

* address comments

Signed-off-by: ryankert01 <ryan980053@gmail.com>

* pre-commit check

Signed-off-by: ryankert01 <ryan980053@gmail.com>

* update test

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* revert my update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: Ryan <ryan980053@gmail.com>
Signed-off-by: Ryan Huang <ryankert01@gmail.com>
Signed-off-by: ryankert01 <ryan980053@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Han-Ju Chen (Future-Outlier) <eric901201@gmail.com>
Co-authored-by: Jun-Hao Wan <ken89@kimo.com>

* [PodPool-VK] add podpool vk README (#4250) (#4251)

* [PodPool-VK] add podpool vk README (#4250)

* Fix lint

Signed-off-by: Rueian <rueiancsie@gmail.com>

* Update podpool-vk/README.md

Signed-off-by: Rueian <rueiancsie@gmail.com>

* fix lint

Signed-off-by: Rueian <rueiancsie@gmail.com>

* rename podpool-vk to podpool

---------

Signed-off-by: Rueian <rueiancsie@gmail.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>

* Bump next from 15.4.8 to 15.4.9 in /dashboard (#4264)

Bumps [next](https://github.com/vercel/next.js) from 15.4.8 to 15.4.9.
- [Release notes](https://github.com/vercel/next.js/releases)
- [Changelog](https://github.com/vercel/next.js/blob/canary/release.js)
- [Commits](https://github.com/vercel/next.js/compare/v15.4.8...v15.4.9)

---
updated-dependencies:
- dependency-name: next
  dependency-version: 15.4.9
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Autoscaler] Add validation to require RayCluster v2 when using idleTimeoutSeconds (#4162)

* add validation for idleTimeoutSeconds config per worker groups

Signed-off-by: alimaazamat <alima.azamat2003@gmail.com>

* Check spec version then fall back to env var

Co-authored-by: Jun-Hao Wan <ken89@kimo.com>
Signed-off-by: Alima Azamat <92766804+alimaazamat@users.noreply.github.com>
Signed-off-by: alimaazamat <alima.azamat2003@gmail.com>

---------

Signed-off-by: alimaazamat <alima.azamat2003@gmail.com>
Signed-off-by: Alima Azamat <92766804+alimaazamat@users.noreply.github.com>
Co-authored-by: Jun-Hao Wan <ken89@kimo.com>

* Bump next from 15.4.9 to 15.4.10 in /dashboard (#4266)

Bumps [next](https://github.com/vercel/next.js) from 15.4.9 to 15.4.10.
- [Release notes](https://github.com/vercel/next.js/releases)
- [Changelog](https://github.com/vercel/next.js/blob/canary/release.js)
- [Commits](https://github.com/vercel/next.js/compare/v15.4.9...v15.4.10)

---
updated-dependencies:
- dependency-name: next
  dependency-version: 15.4.10
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Feature/kubectl plugin/improve support for autoscaling clusters 3832 (#4146)

* [kubectl-plugin] Support scaling min/max replicas in scale cluster command (#3832)

Signed-off-by: AndySung320 <andysung0320@gmail.com>

* add examples

Signed-off-by: AndySung320 <andysung0320@gmail.com>

* test(e2e): add scale and get workergroups e2e tests

- Add getWorkerGroupValues helper function to support.go
- Add e2e tests for 'kubectl ray scale cluster' command
- Add e2e tests for 'kubectl ray get workergroups' command

Signed-off-by: AndySung320 <andysung0320@gmail.com>

* refactor(scale): simplify update logic for min/max/replica
Refactor the update logic for minReplicas, maxReplicas, and replicas
to use the final* variables directly within their respective blocks.

Signed-off-by: AndySung320 <andysung0320@gmail.com>

* ci: retry

Signed-off-by: AndySung320 <andysung0320@gmail.com>

* document default minReplicas value and use explicit numeric maxReplicas

Signed-off-by: AndySung320 <andysung0320@gmail.com>

* cmd/scale: improve wording and extend test coverage

Signed-off-by: AndySung320 <andysung0320@gmail.com>

* e2e: add error check with Expect for kubectl commands in support.go

Signed-off-by: AndySung320 <andysung0320@gmail.com>

---------

Signed-off-by: AndySung320 <andysung0320@gmail.com>

* Revert "[Test][Autoscaler] deflaky unexpected dead actors in tests by setting max_restarts=-1 (#3700)" (#4271)

This reverts commit c75997ac83b5f04669f98af2bdbb7b932f9e9a1a.

* [Autoscaler] validate idleTimeoutSeconds for AutoscalerOptions (#4267)

* validate idleTimeoutSeconds for workergroup spec and autoscaler options

Signed-off-by: alimaazamat <alima.azamat2003@gmail.com>

* remove Autoscaler Options requiring V2 autoscaler

Signed-off-by: alimaazamat <alima.azamat2003@gmail.com>

---------

Signed-off-by: alimaazamat <alima.azamat2003@gmail.com>

* Bump glob from 10.4.5 to 10.5.0 in /dashboard (#4207)

Bumps [glob](https://github.com/isaacs/node-glob) from 10.4.5 to 10.5.0.
- [Changelog](https://github.com/isaacs/node-glob/blob/main/changelog.md)
- [Commits](https://github.com/isaacs/node-glob/compare/v10.4.5...v10.5.0)

---
updated-dependencies:
- dependency-name: glob
  dependency-version: 10.5.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add Helm values for ResourceClaims to RayCluster (#4290)

* [Feature] Support JobDeploymentStatus as the deletion condition (#4262)

* feat: Support JobDeploymentStatus as the deletion condition

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* chore: Regenerate utility codes

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Update api docs

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix(test): Change JobStatus of the deletion condition from val to ptr

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Add JobDeploymentStatus-based e2e tests with four deletion policies

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Add validation tests for JobDeploymentStatus-based deletion rules

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: Sync CRD yaml files into helm chart

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Support JobDeploymentStatus as deletion condition

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Add a helper to check rule match

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Complete TTLSeconds description

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Keep validation logic aligned with kubebuilder

Co-authored-by: Nary Yeh <60069744+machichima@users.noreply.github.com>
Signed-off-by: 江家瑋 <36886416+JiangJiaWei1103@users.noreply.github.com>

* refactor: Write helper for validating deletion condition

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Simplify logic for assigning an empty map to track TTL by policy

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Simplify deletion condition matching logic

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Make deletion rule uniqueness check comment more clear

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Use explicit string type to handle both conditions

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Shorten TTL to speed up e2e test

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* Revert "test: Shorten TTL to speed up e2e test"

This reverts commit 0588f356bd7479b1d66eb4e53a57b525f747b12e.

We need to pass consistency checks for resource preservation.

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

---------

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: 江家瑋 <36886416+JiangJiaWei1103@users.noreply.github.com>
Co-authored-by: Nary Yeh <60069744+machichima@users.noreply.github.com>

* [Feat] Add Ray Cron Job (#4159)

* feat: init RayCronJob controller and add CRs

* feat: update status

* feat: add validate

* feat: add update status function

* refactor: udpate LastScheduleTime type

* feat: implement logic for diff ScheduleStatus

* build: regen CRD

* feat: correctly create reconciler and add loggings

* feat: check if it's time for schedule new rayjob

* feat: add raycronjob example yaml

* fix: remove StatusScheduled

* build: make sync

* refactor: move validate to validation.go

* test: add validation and raycronjob unit test

* feat: add feature gate

* test: update test

* fix: update helm chart rules

* fix: add OwnerReference from cronjob to rayjob

* fix: field order for RayCronJob struct

Signed-off-by: machichima <nary12321@gmail.com>

* fix: remove schedule status

Signed-off-by: machichima <nary12321@gmail.com>

* build: make generate

Signed-off-by: machichima <nary12321@gmail.com>

* fix: update example yaml to use ray 2.52.0 image

Signed-off-by: machichima <nary12321@gmail.com>

* fix: no need to update status for validate fail

Signed-off-by: machichima <nary12321@gmail.com>

* test: use rayCronJobTemplate func to create rayCronJob in test

Signed-off-by: machichima <nary12321@gmail.com>

* build: add kubebuilder rbac config

Signed-off-by: machichima <nary12321@gmail.com>

* feat: extract ray cron job name to constant

Signed-off-by: machichima <nary12321@gmail.com>

* fix: update SetupWithManager

Signed-off-by: machichima <nary12321@gmail.com>

* fix: JobTemplate to normal object

Signed-off-by: machichima <nary12321@gmail.com>

* feat: add cron job origin expected timestamp annotation

Signed-off-by: machichima <nary12321@gmail.com>

* fix: set LastScheduleTime only when job is created

Signed-off-by: machichima <nary12321@gmail.com>

* docs: update comment in example

Signed-off-by: machichima <nary12321@gmail.com>

---------

Signed-off-by: machichima <nary12321@gmail.com>

* [Chore] Upgrade golangci-lint to v2.7.2 and adjust linting configurations (#4007)

* Upgrade golangci-lint to v2.4.0 and adjust linting configurations

Signed-off-by: wei-chenglai <qazwsx0939059006@gmail.com>

* disable linters and formatters

* fix lint

* fix makefile

* fix makefile

* fix config

* update install link

* add comment

---------

Signed-off-by: wei-chenglai <qazwsx0939059006@gmail.com>

* [chore] fix cronjob crd inconsistent (#4292)

Signed-off-by: Rueian <rueiancsie@gmail.com>

* docs: Show missing phony targets and align styles (#4295)

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* [RayCluster] Improved the efficiency when checking rayclusters' expectations (#4209)

* add the implementation of historyserver collector (#4241)

* add the implementation of historyserver collector

update go.work go.mod

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* update the func judging if the event is releated to the Nodes.

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* S3FORCE_PATH_STYPE -> S3FORCE_PATH_STYLE

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* S3DISABLE_SSL -> s3DisableSSL (camel case)

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Add comments to explain WatchSessionLatestLoops

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Future-Outlier <eric901201@gmail.com>

* [history server] Remove go.work and go.work.sum to follow Go's best practices (#4301)

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Clean up unused label for volcano scheduler (#4305)

Signed-off-by: wei-chenglai <qazwsx0939059006@gmail.com>

* fix: Return upon update error for active and pending clusters (#4273)

* fix: Propagate pending cluster update err back to caller

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: Return on err logic

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

---------

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* Update head and worker pod resources in sample manifests (#4288)

* Update head and worker pod resources in sample manifests

Signed-off-by: Yi Chen <github@chenyicn.net>

* Update kubectl plugin e2e tests

Signed-off-by: Yi Chen <github@chenyicn.net>

---------

Signed-off-by: Yi Chen <github@chenyicn.net>

* [Chore] Upgrade Golang version to v1.25 (#4269)

* chore: Bump golang version to v1.25

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* chore: Bump crd-ref-docs to v0.2.0 for Go v1.25

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* chore: Bump Go to v1.25.5

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: Remove patch ver for flexibility

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* chore: Switch to floating tag for building images

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

---------

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: upgrade Ray image in ray-cluster.auth.yaml to 2.53.0 to resolve dashboard 'Failed to load' error (#4310)

Signed-off-by: win5923 <ken89@kimo.com>

* Fix testifylint and gci lint issues (#4293)

Signed-off-by: seanlaii <qazwsx0939059006@gmail.com>

* [Chore] Fix gosec, govet and errcheck lint issues (#4309)

Signed-off-by: win5923 <ken89@kimo.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>

* [Docs] Add history server collector setup doc (#4303)

* docs: Add history server log collector setup guide

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Update fig links

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Add minio and raycluster yamls

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: Remove eventserver dependencies and correct s3 env var

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Udpate PR target

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Remove platform options for local build

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* revert: Make this PR focused on Collector setup guide only

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Support collector-only setup

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Verify events are uploaded to the blob storage

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Fix fig link

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Trigger CI

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Future-Outlier <eric901201@gmail.com>

* [Chore] Enable modernize linter (#4317)

Signed-off-by: seanlaii <qazwsx0939059006@gmail.com>

* [chore] Fix errorlint lint issues (#4306)

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
Signed-off-by: Jun-Hao Wan <ken89@kimo.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>
Co-authored-by: Jun-Hao Wan <ken89@kimo.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>

* [Feat] Cron job add suspend (#4313)

* feat: add Suspend to raycronjob type

Signed-off-by: machichima <nary12321@gmail.com>

* feat: add suspend

Signed-off-by: machichima <nary12321@gmail.com>

* docs: update example

Signed-off-by: machichima <nary12321@gmail.com>

* build: make sync

Signed-off-by: machichima <nary12321@gmail.com>

* refactor: event type name to SuspendedRayCronJob

Signed-off-by: machichima <nary12321@gmail.com>

* refactor: add back omitempty

Signed-off-by: machichima <nary12321@gmail.com>

* fix: precommit

Signed-off-by: machichima <nary12321@gmail.com>

* better test

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix: log suspend log once only

Signed-off-by: machichima <nary12321@gmail.com>

---------

Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Future-Outlier <eric901201@gmail.com>

* [Feature] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy (#4185)

* [Feature] Support recreate pods for RayCluster using RayClusterSpec

Signed-off-by: win5923 <ken89@kimo.com>

* Add test

Signed-off-by: win5923 <ken89@kimo.com>

* improve readability

Signed-off-by: win5923 <ken89@kimo.com>

* Remove deepcopy in GeneratePodTemplateHash

Signed-off-by: win5923 <ken89@kimo.com>

* Refactor ValidateRayClusterUpgradeOptions

Signed-off-by: win5923 <ken89@kimo.com>

* add kubebuilder:validation

Signed-off-by: win5923 <ken89@kimo.com>

* Rename the RayServiceUpgradeType and RayClusterUpgradeType constants

Signed-off-by: win5923 <ken89@kimo.com>

* add ray.io/kuberay-version annotations for head pod and worker pods

Signed-off-by: win5923 <ken89@kimo.com>

* Update ray-operator/controllers/ray/common/pod.go

Co-authored-by: Nary Yeh <60069744+machichima@users.noreply.github.com>
Signed-off-by: Jun-Hao Wan <ken89@kimo.com>

* Revert "add ray.io/kuberay-version annotations for head pod and worker pods"

This reverts commit 5f3afb37724896ee2ae13399ab3d48d26fb6719f.

* add rayClusterScaleExpectation.Delete for deleteAllPods

Signed-off-by: win5923 <ken89@kimo.com>

* Apply suggestions

Signed-off-by: win5923 <ken89@kimo.com>

* better logic

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* solve ci err

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* better yaml file

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Commented out upgradeStrategy for sample yaml

Signed-off-by: win5923 <ken89@kimo.com>

* Update container image for TestRayClusterUpgradeStrategy test

Signed-off-by: win5923 <ken89@kimo.com>

* Compare RayClusterSpec

Signed-off-by: win5923 <ken89@kimo.com>

* Remove WorkerGroupSpecs.IdleTimeoutSeconds and Suspend to follow RayService's solution

Signed-off-by: win5923 <ken89@kimo.com>

* Follow RayService's solution

Signed-off-by: win5923 <ken89@kimo.com>

* Trigger CI

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update the head pod to get the cluster hash and new KubeRay version when KubeRay version changed

Signed-off-by: win5923 <ken89@kimo.com>

* Use UpgradeStrategyRecreateHashKey annotations for RayCluster upgradeStrategy

Signed-off-by: win5923 <ken89@kimo.com>

---------

Signed-off-by: win5923 <ken89@kimo.com>
Signed-off-by: Jun-Hao Wan <ken89@kimo.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Nary Yeh <60069744+machichima@users.noreply.github.com>
Co-authored-by: Future-Outlier <eric901201@gmail.com>

* [Bug] Fix health probes to use custom ports from rayStartParams (#4041)

* fix

* add a new "test" for gofumpt

* init

* add pod test

* add it test

* checkstyle

* Update ray-operator/controllers/ray/common/pod_test.go

Co-authored-by: Jun-Hao Wan <ken89@kimo.com>
Signed-off-by: Itami Sho <42286868+MiniSho@users.noreply.github.com>

* remove unnceccary code

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Trigger CI

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Trigger CI

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: Itami Sho <42286868+MiniSho@users.noreply.github.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Jun-Hao Wan <ken89@kimo.com>
Co-authored-by: Future-Outlier <eric901201@gmail.com>

* [Refactor] Remove duplicate function in e2eautoscaler/support.go and e2erayservice/support.go by reusing test/support/support.go implementations to improve maintainability and reduce redundancy. Related to #3932 (#4038)

Signed-off-by: HSIU-CHI LIU (Tomlord) <aa123593465@gmail.com>
Signed-off-by: Hsiu-Chi Liu (Tomlord) <79390871+Tomlord1122@users.noreply.github.com>

* [Test] [history server] [collector] Add collector e2e tests (#4308)

* docs: Add history server log collector setup guide

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Update fig links

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Add minio and raycluster yamls

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: Remove eventserver dependencies and correct s3 env var

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Udpate PR target

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Add the log collector happy path e2e

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Integrate history server log collector to CI

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Add script comment

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Remove hardcoded consts and add a helper for s3 client

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Align function name

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Ensure test isolation by deleting S3 bucket

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Upload logs during runtime

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: Increase ray job timeout to avoid CI flakiness

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: Increase timeout for readiness of MinIO and Ray cluster\ to avoid CI flakiness

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: Refetch head pod to avoid CI flakiness

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* revert: Add eventserver back and recover Dockerfile and Makefile

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Extract apply Ray job to cluster as helper

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Check logs and node_events are uploaded on del

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* revert: Add back blank test to prevent conflicts

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Refine comments to clarify intention

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Add prepareTestEnv helper fn

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: Use eventually to avoid CI flakiness

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Enable multi assertions

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Cleanup test assertion logic and reuse existing utils

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Complete all func doc string

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Test logs key exists during runtime

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Extract s3 session dir check as a helper for reusability

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* revert: Toleration is sufficient for programmatic data movement

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: Tolerate pod exec err

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* revert: Debug Pod or container restarts

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: Add missing return

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Debug residual state from the first test

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* Revert "test: Debug residual state from the first test"

This reverts commit 5b820fffea7c7ec3d2aca946ec4d433f2846c914.

* test: Separate ns for subtest

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Use existing utils

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Cleanup debug legacy

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Verify logs and node_events have contents

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: Exec pod cmd before cluster is deleted

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* add TODO

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix: Keep port-forward for accessing S3 outside the cluster

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Simplify get session ID logic

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Move logs by ray-head container startup cmd

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Complete comments

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Check logs and events must exist

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Clarify test logic

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* fix: Seperate Test obj

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* refactor: Wrap subtests in a loop

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Clarify the intention of kill 1 command

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Further clarify the end goal of forcing OOMKilled

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Remove redundant cleanup

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* refactor: Clear S3 session verification logic flow

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Check old session dir exists in prev-logs and persit-complete-logs

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Make test case description clear

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* better name for test 2

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Future-Outlier <eric901201@gmail.com>

* [history server][collector] Remove unused function processAllLogs (#4316)

* remove also

* remove processPrevLogsOnShutdown

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Future-Outlier <eric901201@gmail.com>

* [Feat][kubectl-plugin] Add shell completion for for kubectl ray get [workergroups|nodes] (#4291)

* [kubectl-plugin][WIP] Add shell completion for  and

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Fix] Skip resource fetching for shell completion with --all-namespaces

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Fix] Remove redundant namespace check in WorkerGroupCompletionFunc

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Fix] Improve comments in shell completion functions

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Refactor][Test] Refactor WorkerGroupCompletionFunc and NodeCompletionFunc to accept client.Client parameter for testability. Add initial unit tests.

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Test] Add unit tests for completion function edge cases

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Fix][kubectl-plugin] Remove the redundant namespace check

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: JustinYeh <justinyeh1995@gmail.com>

* [kubectl-plugin] Improve completion with FieldSelector filtering and workergroup deduplication

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [kubectl-plugin][Refactor] Use labels.Set for label selector formatting

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Chore] fix typo in the help text for all-namesapces flag

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Chore][kubectl-plugin] Fix struct field alignment in tests

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

---------

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
Signed-off-by: JustinYeh <justinyeh1995@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [Chore] Fix staticcheck lint errors (#4326)

Signed-off-by: wei-chenglai <qazwsx0939059006@gmail.com>

* chore: Bump up KuberayUpgradeVersion default version for e2e test (#4331)

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* [Config] Change all RayCluster headGroupSpec limit memory to 5Gi (#4328)

* chore: adjust memory < 5Gi limit in sample files

Signed-off-by: Cheyu Wu <cheyu1220@gmail.com>

* chore: align config format

Signed-off-by: CheyuWu <cheyu1220@gmail.com>

* chore: set memory to 5Gi

Signed-off-by: CheyuWu <cheyu1220@gmail.com>

---------

Signed-off-by: Cheyu Wu <cheyu1220@gmail.com>
Signed-off-by: CheyuWu <cheyu1220@gmail.com>

* [Chore] Fix noctx, revive lint issues (#4333)

* [Chore] Fix noctx linter violations across codebase

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Chore][revive][1/N] Fix revive linter violations across codebase, fix var-naming don't use underscores in Go names & avoid meaningless package names

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Chore][revive][2/N] Fix revive linter violations across codebase, fix unexported-return issues

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [Chore] Trigger CI

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

---------

Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>

* [historyserver] Ensure at least one worker in sample RayCluster (#4330)

* [historyserver] Ensure at least one worker in sample RayCluster

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Trigger CI

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Update historyserver/config/raycluster.yaml

Co-authored-by: Jun-Hao Wan <ken89@kimo.com>
Signed-off-by: Han-Ju Chen (Future-Outlier) <eric901201@gmail.com>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Signed-off-by: Han-Ju Chen (Future-Outlier) <eric901201@gmail.com>
Co-authored-by: Jun-Hao Wan <ken89@kimo.com>

* docs: Clarify multi-arch phony comments (#4311)

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* historyserver: remove unused function in RayLogHandler (#4336)

Signed-off-by: AndySung320 <andysung0320@gmail.com>

* [history server][collector] Fix getJobID for job event collection (#4342)

* [historyserver] Fix getJobID for job event collection

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* add jia-wei as co-author, since he debug with me together

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Jia-Wei Jiang <waynechuang97@gmail.com>

* remove unused code

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update rueian's advice

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* add task profile event example

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* revert back oneof solution

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* add task profile event

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update rueian's advice

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* a worked version in ray 2.52.0

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Jia-Wei Jiang <waynechuang97@gmail.com>

* chore: Use double quoted resource values in sample manifest files. (#4339)

* [history server] move storage interface (#4302)

* historyserver: move storage  and update imports

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* vet & fmt.

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* chroe: update code structure and move storage to interface

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

---------

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* background goroutine get job info (#4160)

* [RayJob] background job info poc

* [RayJob] add implement some methods

* [RayJob] encapsulate the worker pool

* [RayJob] replace concurrency map with lru cache

* [RayJob] remove cache on stop and config flag

* [RayJob] expiry cache cleanup goroutine

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] code and comment minor fix

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] task check contain or not befor add

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] remove delete cache from deleteClusterResources and add lock for cache

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [Helm] add argument for useBackgroundGoroutine

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] repeated error did not update

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] remove unused function and background goroutine observability

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] cache client support graceful shutdown

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] rename useBackgroundGoroutine to asyncJobInfoQuery

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] use ray job info in logger

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] remove cacheStorage nil check

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] bg goroutine uses operator context instead

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] bg goroutine handle task queue full

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] correct the comment

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] refactor initialize dashboard client for background goroutine

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] worker handle ctx.Done correctly

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] remove unnecessary putting task into queue

* [RayJob] if queue is full, retry again

* [RayJob] make cache immutable to avoid data race

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] remove unused function

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] remove cacheStorage lock

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] update cache error

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] If error on fetching job info, it removes from loop

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] task queue is extendable

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] change slice to ring buffer

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] rename PutTask to AddTask

* [RayJob] extendable channel use open source library

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] async job info query use feature gate instead

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] add comment for task

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] rename function signature of worker pool init function

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] change ErrAgain error message

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] fix lint error

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] change back to EAGAIN

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] remove queue size from todo comment

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] rename queue full error

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] add lock to avoid data race

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] requeue check context has canceled or not

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] add cluster name on the cache key

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] check raycluster is nil or not when initializing the dashboard client

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] avoid send to a block channel when graceful shutdown

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] use contain to check the placeholder at the beginning of task

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] graceful shutdown avoid panic from a nil task

* [RayJob] fix channel receive condition

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] fix nil rayCluster in dashboard cache client

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] remove with name from log for sharing purpose

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] remove checkname to avoid collision

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] add task with blocking send

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] remove unused error

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [RayJob] provide raycluster name if it is absent for removing cache

Signed-off-by: fscnick <fscnick.dev@gmail.com>

---------

Signed-off-by: fscnick <fscnick.dev@gmail.com>

* [kubectl-plugin][Test] Use client-go reactors for FieldSelector filtering in fake client tests (#4361)

* feat(kubectl-plugin): add FieldSelector reactor helper for fake client tests

Add AddRayClusterFieldSelectorReactor helper that simulates server-side
FieldSelector filtering in fake client tests. This addresses issue #4337
by allowing tests to verify filtering behavior without manual name checks.

Refs: #4337

* test(kubectl-plugin): apply FieldSelector reactor to completion tests

Use the new AddRayClusterFieldSelectorReactor in workergroup completion
tests to properly simulate server-side filtering behavior.

Refs: #4337

* refactor(kubectl-plugin): remove manual name filtering in completion

Remove the workaround that manually filtered clusters by name since the
fake client now properly supports FieldSelector filtering via reactors.

Refs: #4337

* chore(kubectl-plugin): migrate from NewSimpleClientset to NewClientset

NewSimpleClientset is deprecated in favor of NewClientset for better
server-side apply testing support.

Note: scale_cluster_test.go is not migrated because it uses Update
operations that require schema definitions missing from the generated
applyconfiguration internal schema.

Refs: #4337

* refactor(kubectl-plugin): add NewRayClientset wrapper for simpler test setup

Add a convenience wrapper that creates a fake Ray clientset with
FieldSelector reactor pre-configured. This simplifies test setup
and ensures consistent behavior across tests.

Also applies reactor to get_cluster_test.go and fixes test data
to match actual cluster names now that FieldSelector properly filters.

Refs: #4337

* Update kubectl-plugin/pkg/util/client/testing/reactor.go

Co-authored-by: JustinYeh <justinyeh1995@gmail.com>
Signed-off-by: Ikenna <ikennachifo@gmail.com>

* fix(kubectl-plugin): update references to renamed reactor function

Update clientset.go and reactor_test.go to use the renamed function
AddRayClusterListFieldSelectorReactor.

---------

Signed-off-by: Ikenna <ikennachifo@gmail.com>
Co-authored-by: JustinYeh <justinyeh1995@gmail.com>

* Support Multi-Arch Image in CI (#4348)

* Add KuberayTestArch environment variable for architecture override in tests

This commit introduces a new environment variable KuberayTestArch that allows
overriding the detected architecture in test environments. Previously, the
system only relied on runtime.GOARCH to determine if ARM64 architecture was
being used, but this change enables explicit architecture specification
through the environment variable. This is particularly useful for testing
scenarios where you want to force a specific architecture regardless of the
actual runtime environment, improving test flexibility and consistency
across different platforms.

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Future-Outlier <eric901201@gmail.com>

* generate clientset with 1.35 code-generator (#4347)

* generate clientset with 1.35 code-generator

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* Update codegen script to use go list for k8s.io/code-generator path resolution

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* Run make sync

---------

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* [master] Fix Ray CI integration for release automation (#4370)

* push

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* tesst

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* test

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* finally all good

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* [history server] Web Server + Event Processor (#4329)

* Add event server for history server.

Co-authored-by: chiayi chiayiliang327@gmail.com
Co-authored-by: KunWuLuan kunwuluan@gmail.com

* Update test

* [history server] Web Server

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* add Kun Wu's setting

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: KunWuLuan <kunwuluan@gmail.com>

* a worked version

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* a worked version, will revise it

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Trigger CI

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* merge master

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* turn chinese comments to english

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix bugs and make dead cluster endpoint work or return not yet supported

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* support task summarize, not yet test live cluster

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* support predicate

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* remove license

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix Stop signal ignored during hour-long sleep period

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix Main exits without waiting for graceful shutdown

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* remove log key info

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix Graceful shutdown incorrectly treated as fatal error

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix Event processor failure causes event processing to block

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Fix Task update discards all fields except attempt number, but this is short term fix, we should use list

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix max clusters default 0 problem, and add todo

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix Missing cookie path causes repeated Kubernetes API calls

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix task list problems

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* add actor json tag

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* handle task lifecycle event, need to update to binary search

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* change upsert to merge

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* handle task and actor endpoint better, make them complete

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix SSRF via user-controlled service name cookie

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* actor and task need to solve Duplicate events appended on each hourly reprocessing cycle

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* solve Duplicate events appended on each hourly reprocessing cycle

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix Unchecked type assertions can cause panics

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* HTTP proxy requests lack timeout causing potential hangs

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix Nil map panic when processing null event entries

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix Environment variable bypasses SSRF protection for live cluster proxying

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* support required resources and server timeout error

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* better serviceaccount

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Add Readme

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* better comments for log dir path

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix race condition

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* better const explaination for seperator connector

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* 1 better actor response; 2 cleanup dead code

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* remove dead code

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* fix comments

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Aaron Liang <aaronliang@google.com>
Co-authored-by: KunWuLuan <kunwuluan@gmail.com>

* [Bug][RayJob] Sidecar mode shouldn't restart head pod when head pod is deleted (#4234)

* [Bug][RayJob] Sidecar mode shouldn't restart head pod when head pod is deleted

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* [fix] fix CI error

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* update

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* reunite if statement

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* update

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* fix ci error

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* fix

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* put back unnecessary comment deletion

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* Better rayjob logic

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Update ray-operator/test/e2erayjob/rayjob_test.go

Co-authored-by: Jun-Hao Wan <ken89@kimo.com>
Signed-off-by: Ping <fourhundredping@gmail.com>

* Update ray-operator/test/e2erayjob/rayjob_test.go

Co-authored-by: Nary Yeh <60069744+machichima@users.noreply.github.com>
Signed-off-by: Ping <fourhundredping@gmail.com>

* update rayjob test

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* fix merge conflict error

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* Update ray-operator/test/e2erayjob/rayjob_sidecar_mode_test.go

Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com>
Signed-off-by: Ping <fourhundredping@gmail.com>

* update

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* revert reason assertion

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* [chore] retrigger ci

* update

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* [chore] change from HeadPod to GetHeadPod

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* add submission mode label key label

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* Update ray-operator/controllers/ray/utils/constant.go

Co-authored-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: Ping <fourhundredping@gmail.com>

* Update ray-operator/controllers/ray/raycluster_controller.go

Co-authored-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: Ping <fourhundredping@gmail.com>

* Update ray-operator/controllers/ray/raycluster_controller.go

Co-authored-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: Ping <fourhundredping@gmail.com>

* Update ray-operator/controllers/ray/rayjob_controller.go

Co-authored-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: Ping <fourhundredping@gmail.com>

* Update ray-operator/controllers/ray/rayjob_controller.go

Co-authored-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: Ping <fourhundredping@gmail.com>

* Update ray-operator/controllers/ray/utils/constant.go

Co-authored-by: Nary Yeh <60069744+machichima@users.noreply.github.com>
Signed-off-by: Ping <fourhundredping@gmail.com>

* Update ray-operator/controllers/ray/rayjob_controller.go

Co-authored-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: Ping <fourhundredping@gmail.com>

* update

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* Add missing label

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* update

Signed-off-by: 400Ping <fourhundredping@gmail.com>

* update

Signed-off-by: 400Ping <fourhundredping@gmail.com>

---------

Signed-off-by: 400Ping <fourhundredping@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Signed-off-by: Ping <fourhundredping@gmail.com>
Co-authored-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Jun-Hao Wan <ken89@kimo.com>
Co-authored-by: Nary Yeh <60069744+machichima@users.noreply.github.com>
Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>

* [Refactor] [Test] Add helpers and use auto cleanup for testing the RayJob deletion strategy (#4363)

* refactor: Extract helpers and separate ns for auto cleanup

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Add logs

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* docs: Add logs to make it easier to track test flow

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* test: Check Ray job is running

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

---------

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* Change Ray/Kuberay Google Calendar and Kuberay Sync link (#4401)

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* [historyserver][collector] Add file-level idempotency check for prev-logs processing on container restart (#4321)

* feat(historyserver):re-push prev-logs on pod restart

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* chroe(historyserver): replace hard code path

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* fmt.

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* test(historyserver): add test for logcollector restart

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* add e2e test for repush

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* fix e2e test.

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* add Troubleshooting.

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* rm redundant cleanup.

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* reuse WatchPrevLogsLoops to scan existing logs.

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* simulate partial upload in e2e test.

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* fix unit test.

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* fix lint

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* fix mv race condition in e2e test.

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* Apply suggestion from @JiangJiaWei1103

Co-authored-by: 江家瑋 <36886416+JiangJiaWei1103@users.noreply.github.com>
Signed-off-by: yi wang <48236141+my-vegetable-has-exploded@users.noreply.github.com>

* address comments.

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* e2e test: add assertions and update description

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>

* Better test

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>
Signed-off-by: yi wang <48236141+my-vegetable-has-exploded@users.noreply.github.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: 江家瑋 <36886416+JiangJiaWei1103@users.noreply.github.com>
Co-authored-by: Future-Outlier <eric901201@gmail.com>

* [Docs] [history server] Create service account for history server deployment (#4396)

* docs: Add creating sa

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>

* better readme

Signed-off-by: Future-Outlier <eric901201@gmail.com>

---------

Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Future-Outlier <eric901201@gmail.com>

* [Docs][History Server] update instructions for live cluster section (#4408)

* docs: update for live cluster section

Signed-off-by: machichima <nary12321@gmail.com>

* docs: clearer description

Co-authored-by: 江家瑋 <36886416+JiangJiaWei1103@users.noreply.github.com>
Signed-off-by: Nary Yeh <60069744+machichima@users.noreply.github.com>

---------

Signed-off-by: machichima <nary12321@gmail.com>
Signed-off-by: Nary Yeh <60069744+machichima@users.noreply.github.com>
Co-authored-by: 江家瑋 <36886416+JiangJiaWei1103@users.noreply.github.com>

* [Test] [history server] [collector] Ensure event type coverage (#4343)

* [historyserver] Fix getJobID for job event collection

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* add jia-wei as co-author, since he debug with me together

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Co-authored-by: Jia-Wei Jiang <waynechuang97@gmail.com>

* remove unused code

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* update rueian's advice

Signed-off-by: Future-Outlier <eric901201@gmail.com>

* add task profile event ex…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

3 participants