feat: Add an example of using Agent Sandbox and Kata on GKE cluster#230
Conversation
✅ Deploy Preview for agent-sandbox canceled.
|
|
Welcome @maqiuyujoyce! |
|
Hi @maqiuyujoyce. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@zvonkok FYI! |
|
This is great. I'd recommend cleaning up this to make sure there is a single place we can share all Kata instructions. Thoughts ? |
| @@ -0,0 +1,84 @@ | |||
| # Enabling Kata Containers on GKE | |||
There was a problem hiding this comment.
It's not clear how this example relates to Agent Sandbox. Would you clarify that?
There was a problem hiding this comment.
Thanks for the feedback, @janetkuo ! Sorry it took a while for me to get back to the PR.
IIUC, a key design goal of OSS Agent Sandbox is flexibility and support for multiple virtualization engines, e.g. Kata. I think this is still true, but let me know if the focus has been shifted.
By providing a concrete example with Kata Containers, we show that Agent Sandbox is not locked into a single virtualization technology. And it could help attract users with security-sensitive needs.
There was a problem hiding this comment.
Your understanding is correct, and we already have agent sandbox examples that uses Kata Containers.
However, my question is, how does this example "relate to Agent Sandbox". Agent Sandbox isn't mentioned or used in this example.
There was a problem hiding this comment.
Ah gotcha! Added the Agent Sandbox steps into the example!
|
The guide targets nested VMs; sooner or later, we need to have instructions for bare-metal machines as well. Future use cases with accelerators do not work with CSP nested virtualization offerings. |
|
/ok-to-test |
This is an excellent point! Thank you for bringing up these advanced use cases. My intention with this PR is to provide an accessible entry point for users to get started with Kata on a widely-used platform like GKE. I think adding support for bare-metal and accelerators is a great future direction. I can file a follow-up issue to track this (if there isn't one already) so the idea doesn't get lost. |
40994de to
823bd04
Compare
This is a good point, @aditya-shantanu, and I agree with the principle of having a single source of truth. I gave this some thought, and my main concern is that the target users and environments for these two guides are quite different. The vscode-sandbox guide is for a quick, local setup on Minikube, while this PR targets a more production-like setup on GKE, with its own specific prerequisites (like IAM and machine types). How about I add a note and a link at the relevant section of both guides, so that users are aware of the alternative and can easily navigate to the one that fits their needs? |
|
+1, instead of having a single place to share all Kata instructions in this repo, we should reference official Kata docs so that Kata specific content stays current. We only document the "using Kata" part in Agent Sandbox docs in this repo. |
| spec: | ||
| podTemplate: | ||
| spec: | ||
| runtimeClassName: kata-qemu |
There was a problem hiding this comment.
runtimeClassName is hardcoded here even though setup.sh allows customization via RUNTIME_CLASS_NAME. This mismatch could cause confusing.
There was a problem hiding this comment.
Good catch! Will update.
| runtimeClassName: kata-qemu | ||
| containers: | ||
| - name: hello-kata | ||
| image: busybox |
There was a problem hiding this comment.
nit: pin to a specific tag to ensure reproducibility
There was a problem hiding this comment.
Good point. Will update!
| * *Prohibited:* E2 (no nested virt), AMD (N2D - nested virt not supported by GKE yet), ARM (T2A). | ||
| * **OS Image:** Must be **Ubuntu** (UBUNTU_CONTAINERD). | ||
| * *Prohibited:* Container-Optimized OS (COS) is read-only and blocks the installer. | ||
| * **Region/Zone:** Must use a zone where N2 hardware is available (e.g., us-central1-a, us-west1-b). |
There was a problem hiding this comment.
For this part (L18-22), is there a doc we can reference in GKE, so that we don't need to maintain it?
There was a problem hiding this comment.
Yep, I'll update.
| gcloud services enable container.googleapis.com | ||
| ``` | ||
| 3. [Ensure that your organization policy supports creating nested VMs](https://cloud.google.com/compute/docs/instances/nested-virtualization/managing-constraint#check_whether_nested_virtualization_is_allowed). | ||
| 4. Review the nested VM [restrictions](https://cloud.google.com/compute/docs/instances/nested-virtualization/overview#restrictions) (as of Dec 2025). Kata requires specific hardware support that is not available on default GKE nodes. |
There was a problem hiding this comment.
Could update to the current date, once you've checked it's up-to-date.
| 4. Review the nested VM [restrictions](https://cloud.google.com/compute/docs/instances/nested-virtualization/overview#restrictions) (as of Dec 2025). Kata requires specific hardware support that is not available on default GKE nodes. | |
| 4. Review the nested VM [restrictions](https://cloud.google.com/compute/docs/instances/nested-virtualization/overview#restrictions) (as of Feb 2026). Kata requires specific hardware support that is not available on default GKE nodes. |
There was a problem hiding this comment.
Will update if we plan to keep the section as is.
| By default, Agent Sandbox uses standard container runtimes that provide OS-level isolation where all sandboxes share the host node's kernel. This guide shows how to configure and use the Kata runtime to give each sandbox its own dedicated kernel, providing stronger, hardware-virtualized isolation. This is a common requirement for running highly sensitive or untrusted workloads. | ||
|
|
||
| ## Prerequisites | ||
|
|
There was a problem hiding this comment.
It seems that this whole prereq section can be replaced with a reference to https://docs.cloud.google.com/kubernetes-engine/docs/how-to/nested-virtualization#before_you_begin for nested virtualization.
There was a problem hiding this comment.
Yes. Do you suggest I update this whole section to be a reference to the GCP doc or keep it as is? I'm open to either ways.
The only reason I "copied" quite some content here is that the readability of the GCP doc is not very beginner friendly. But agreed it's a bit out-of-scope.
There was a problem hiding this comment.
@maqiuyujoyce FWIW, I'd suggest you just point to the GCP doc
|
|
||
| For details on available `[OPTIONS...]`, please see the script itself. | ||
| ```shell | ||
| ./setup.sh [OPTIONS...] |
There was a problem hiding this comment.
Would you make it more clear what the script does, so that users know that a cluster will be created etc. From reading prereq I presume users need to configure machine types and node images manually, but it's actually done by the script.
There was a problem hiding this comment.
Will add more comment / example output here.
maqiuyujoyce
left a comment
There was a problem hiding this comment.
+1, instead of having a single place to share all Kata instructions in this repo, we should reference official Kata docs so that Kata specific content stays current. We only document the "using Kata" part in Agent Sandbox docs in this repo.
Thank you for the feedback, @janetkuo ! Sorry it took a while for me to respond.
Regarding the details about installing Kata, the official Kata docs weren't super helpful when I worked on it, that's why I felt the need to have a Agent Sandbox + GKE + Kata tutorial here. Right now the Kata related steps are mostly hidden in the script. Do you still think that's too much for Agent Sandbox?
| By default, Agent Sandbox uses standard container runtimes that provide OS-level isolation where all sandboxes share the host node's kernel. This guide shows how to configure and use the Kata runtime to give each sandbox its own dedicated kernel, providing stronger, hardware-virtualized isolation. This is a common requirement for running highly sensitive or untrusted workloads. | ||
|
|
||
| ## Prerequisites | ||
|
|
There was a problem hiding this comment.
Yes. Do you suggest I update this whole section to be a reference to the GCP doc or keep it as is? I'm open to either ways.
The only reason I "copied" quite some content here is that the readability of the GCP doc is not very beginner friendly. But agreed it's a bit out-of-scope.
| gcloud services enable container.googleapis.com | ||
| ``` | ||
| 3. [Ensure that your organization policy supports creating nested VMs](https://cloud.google.com/compute/docs/instances/nested-virtualization/managing-constraint#check_whether_nested_virtualization_is_allowed). | ||
| 4. Review the nested VM [restrictions](https://cloud.google.com/compute/docs/instances/nested-virtualization/overview#restrictions) (as of Dec 2025). Kata requires specific hardware support that is not available on default GKE nodes. |
There was a problem hiding this comment.
Will update if we plan to keep the section as is.
| * *Prohibited:* E2 (no nested virt), AMD (N2D - nested virt not supported by GKE yet), ARM (T2A). | ||
| * **OS Image:** Must be **Ubuntu** (UBUNTU_CONTAINERD). | ||
| * *Prohibited:* Container-Optimized OS (COS) is read-only and blocks the installer. | ||
| * **Region/Zone:** Must use a zone where N2 hardware is available (e.g., us-central1-a, us-west1-b). |
There was a problem hiding this comment.
Yep, I'll update.
|
|
||
| For details on available `[OPTIONS...]`, please see the script itself. | ||
| ```shell | ||
| ./setup.sh [OPTIONS...] |
There was a problem hiding this comment.
Will add more comment / example output here.
| spec: | ||
| podTemplate: | ||
| spec: | ||
| runtimeClassName: kata-qemu |
There was a problem hiding this comment.
Good catch! Will update.
| runtimeClassName: kata-qemu | ||
| containers: | ||
| - name: hello-kata | ||
| image: busybox |
There was a problem hiding this comment.
Good point. Will update!
|
@maqiuyujoyce - Happy to review again once it is updated. PTAL. /approve |
|
/lgtm lets iterate on feedback in a separate PR |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: barney-s, maqiuyujoyce The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes #176.
This PR added instructions to install Kata on a cluster cluster and use Kata container as the agent runtime.
I verified locally that the script and the doc works.