Skip to content

Tags: AI-Hypercomputer/xpk

Tags

v1.13.1

Toggle v1.13.1's commit message
fix: Fix --adapt-from-ct for externally managed jobset and kueue (#1192)

* feat: skip Kueue and Jobset manifest installs if managed by Helm

* Add unit tests for helm-managed Kueue and Jobset

* Refactor is_managed_by_helm to use jsonpath for speed and target deployments explicitly

* Simplify is_managed_by_helm to only check managed-by label

* Simplify code

* Rename out to container_image in _should_install_jobset

* Rename is_managed_by_helm to is_managed_externally

* Fix pylint trailing whitespaces

* Fix pyink formatting

v1.13.0

Toggle v1.13.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: Fix gcluster and make Nightly Tests passing again (#1191)

* Remove secrets.CLUSTER_ARGUMENTS and always use CLUSTER_NETWORK_ARGUMENTS

* Fix ray cluster e2e

* Use the most recent gcluster version

* Update gcluster version in blueprint generator and docker manager

* Remove the "v" prefix from jobset version in blueprint_generator

* fix(blueprint): conditionally omit host_maintenance_interval for spot A3 Mega clusters

Preemptible VMs (spot instances) do not support periodic maintenance intervals. This caused cluster toolkit deployments using the A3 Mega blueprint with `--spot` to fail with a Google Compute Engine 'Invalid value' error. This commit conditionally omits `host_maintenance_interval` if the capacity type is SPOT.

* Fix GPU test and update test fixture to fix failing unit tests

Also format python code with pyink.

TAG=agy
CONV=a5a75b42-5457-49d9-bdbb-e84c6347b9b7

v1.12.0

Toggle v1.12.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update kubernetes dependency to version 35.0.0 (#1178)

Update the client so it supports HTTPS_PROXY

v1.11.0

Toggle v1.11.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: support --device-type for xpk cluster create-ray (#1175)

v1.10.0

Toggle v1.10.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Clarify path installation is only for installing from source (#1170)

v1.9.0

Toggle v1.9.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: JobSet Status Resolution (#1161)

* Fix JobSet Status Resolution Regression

* Remove overly protective shlex escaping

v1.8.0

Toggle v1.8.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Enable Crane (#1120)

* Enable Crane. Keep a golden and integration test for Docker (to be cleaned up later)

* Remove non-crane tests

v1.7.0

Toggle v1.7.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Ignore nodepool creation errors (#1096)

* feat: ignore nodepool creation errors

* pyink

* feat: do not fail early when running commands batch

* feat: add ignore-nodepool-errors to xpk cluster adapt

* pylint

v1.6.0

Toggle v1.6.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add customizable binaries support (#1085)

feat: add customizable binaries support

v1.5.0

Toggle v1.5.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix: Super-slicing cluster documentation (#1070)

Trillium is v6, and Super-slicing works for 7x+.