Tags: AI-Hypercomputer/xpk
Tags
fix: Fix --adapt-from-ct for externally managed jobset and kueue (#1192) * feat: skip Kueue and Jobset manifest installs if managed by Helm * Add unit tests for helm-managed Kueue and Jobset * Refactor is_managed_by_helm to use jsonpath for speed and target deployments explicitly * Simplify is_managed_by_helm to only check managed-by label * Simplify code * Rename out to container_image in _should_install_jobset * Rename is_managed_by_helm to is_managed_externally * Fix pylint trailing whitespaces * Fix pyink formatting
fix: Fix gcluster and make Nightly Tests passing again (#1191) * Remove secrets.CLUSTER_ARGUMENTS and always use CLUSTER_NETWORK_ARGUMENTS * Fix ray cluster e2e * Use the most recent gcluster version * Update gcluster version in blueprint generator and docker manager * Remove the "v" prefix from jobset version in blueprint_generator * fix(blueprint): conditionally omit host_maintenance_interval for spot A3 Mega clusters Preemptible VMs (spot instances) do not support periodic maintenance intervals. This caused cluster toolkit deployments using the A3 Mega blueprint with `--spot` to fail with a Google Compute Engine 'Invalid value' error. This commit conditionally omits `host_maintenance_interval` if the capacity type is SPOT. * Fix GPU test and update test fixture to fix failing unit tests Also format python code with pyink. TAG=agy CONV=a5a75b42-5457-49d9-bdbb-e84c6347b9b7
PreviousNext