feat: Add an example of using Agent Sandbox and Kata on GKE cluster by maqiuyujoyce · Pull Request #230 · kubernetes-sigs/agent-sandbox

maqiuyujoyce · 2025-12-30T01:03:57Z

Fixes #176.

This PR added instructions to install Kata on a cluster cluster and use Kata container as the agent runtime.

I verified locally that the script and the doc works.

netlify · 2025-12-30T01:04:03Z

✅ Deploy Preview for agent-sandbox canceled.

Name	Link
🔨 Latest commit	`823bd04`
🔍 Latest deploy log	https://app.netlify.com/projects/agent-sandbox/deploys/698143d9161b7a0007a03b06

k8s-ci-robot · 2025-12-30T01:04:06Z

Welcome @maqiuyujoyce!

It looks like this is your first PR to kubernetes-sigs/agent-sandbox 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/agent-sandbox has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-12-30T01:04:08Z

Hi @maqiuyujoyce. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

maqiuyujoyce · 2025-12-30T01:04:35Z

@zvonkok FYI!

aditya-shantanu · 2026-01-05T16:03:04Z

This is great.

I'd recommend cleaning up this to make sure there is a single place we can share all Kata instructions. Thoughts ?

janetkuo · 2026-01-07T02:26:49Z

@@ -0,0 +1,84 @@
+# Enabling Kata Containers on GKE


It's not clear how this example relates to Agent Sandbox. Would you clarify that?

Thanks for the feedback, @janetkuo ! Sorry it took a while for me to get back to the PR.

IIUC, a key design goal of OSS Agent Sandbox is flexibility and support for multiple virtualization engines, e.g. Kata. I think this is still true, but let me know if the focus has been shifted.

By providing a concrete example with Kata Containers, we show that Agent Sandbox is not locked into a single virtualization technology. And it could help attract users with security-sensitive needs.

Your understanding is correct, and we already have agent sandbox examples that uses Kata Containers.

However, my question is, how does this example "relate to Agent Sandbox". Agent Sandbox isn't mentioned or used in this example.

Ah gotcha! Added the Agent Sandbox steps into the example!

zvonkok · 2026-01-07T14:25:18Z

The guide targets nested VMs; sooner or later, we need to have instructions for bare-metal machines as well. Future use cases with accelerators do not work with CSP nested virtualization offerings.
We can leverage Kata Peer Pods if we need accelerator support on CSP that offer only VMs and no bare-metal machines are available.

janetkuo · 2026-01-07T22:15:08Z

/ok-to-test

maqiuyujoyce · 2026-01-31T01:44:06Z

The guide targets nested VMs; sooner or later, we need to have instructions for bare-metal machines as well. Future use cases with accelerators do not work with CSP nested virtualization offerings. We can leverage Kata Peer Pods if we need accelerator support on CSP that offer only VMs and no bare-metal machines are available.

This is an excellent point! Thank you for bringing up these advanced use cases.

My intention with this PR is to provide an accessible entry point for users to get started with Kata on a widely-used platform like GKE.

I think adding support for bare-metal and accelerators is a great future direction. I can file a follow-up issue to track this (if there isn't one already) so the idea doesn't get lost.

maqiuyujoyce · 2026-02-04T01:01:59Z

This is great.

I'd recommend cleaning up this to make sure there is a single place we can share all Kata instructions. Thoughts ?

This is a good point, @aditya-shantanu, and I agree with the principle of having a single source of truth.

I gave this some thought, and my main concern is that the target users and environments for these two guides are quite different. The vscode-sandbox guide is for a quick, local setup on Minikube, while this PR targets a more production-like setup on GKE, with its own specific prerequisites (like IAM and machine types).
Combining them could make the document long and potentially confusing for users who just want the specific steps for their environment. It might be clearer to keep them in separate docs.

How about I add a note and a link at the relevant section of both guides, so that users are aware of the alternative and can easily navigate to the one that fits their needs?

janetkuo · 2026-02-06T19:45:31Z

+1, instead of having a single place to share all Kata instructions in this repo, we should reference official Kata docs so that Kata specific content stays current. We only document the "using Kata" part in Agent Sandbox docs in this repo.

janetkuo · 2026-02-06T19:51:58Z

+spec:
+  podTemplate:
+    spec:
+      runtimeClassName: kata-qemu


runtimeClassName is hardcoded here even though setup.sh allows customization via RUNTIME_CLASS_NAME. This mismatch could cause confusing.

Good catch! Will update.

janetkuo · 2026-02-06T19:53:05Z

+      runtimeClassName: kata-qemu
+      containers:
+      - name: hello-kata
+        image: busybox


nit: pin to a specific tag to ensure reproducibility

Good point. Will update!

janetkuo · 2026-02-06T19:58:51Z

+        *   *Prohibited:* E2 (no nested virt), AMD (N2D - nested virt not supported by GKE yet), ARM (T2A).
+    *   **OS Image:** Must be **Ubuntu** (UBUNTU_CONTAINERD).
+        *   *Prohibited:* Container-Optimized OS (COS) is read-only and blocks the installer.
+    *   **Region/Zone:** Must use a zone where N2 hardware is available (e.g., us-central1-a, us-west1-b).


For this part (L18-22), is there a doc we can reference in GKE, so that we don't need to maintain it?

Yep, I'll update.

janetkuo · 2026-02-06T19:59:31Z

+    gcloud services enable container.googleapis.com
+    ```
+3.  [Ensure that your organization policy supports creating nested VMs](https://cloud.google.com/compute/docs/instances/nested-virtualization/managing-constraint#check_whether_nested_virtualization_is_allowed).
+4.  Review the nested VM [restrictions](https://cloud.google.com/compute/docs/instances/nested-virtualization/overview#restrictions) (as of Dec 2025). Kata requires specific hardware support that is not available on default GKE nodes.


Could update to the current date, once you've checked it's up-to-date.

Suggested change

4. Review the nested VM [restrictions](https://cloud.google.com/compute/docs/instances/nested-virtualization/overview#restrictions) (as of Dec 2025). Kata requires specific hardware support that is not available on default GKE nodes.

4. Review the nested VM [restrictions](https://cloud.google.com/compute/docs/instances/nested-virtualization/overview#restrictions) (as of Feb 2026). Kata requires specific hardware support that is not available on default GKE nodes.

Will update if we plan to keep the section as is.

janetkuo · 2026-02-06T20:01:46Z

+By default, Agent Sandbox uses standard container runtimes that provide OS-level isolation where all sandboxes share the host node's kernel. This guide shows how to configure and use the Kata runtime to give each sandbox its own dedicated kernel, providing stronger, hardware-virtualized isolation. This is a common requirement for running highly sensitive or untrusted workloads.
+
+## Prerequisites
+


It seems that this whole prereq section can be replaced with a reference to https://docs.cloud.google.com/kubernetes-engine/docs/how-to/nested-virtualization#before_you_begin for nested virtualization.

Yes. Do you suggest I update this whole section to be a reference to the GCP doc or keep it as is? I'm open to either ways.

The only reason I "copied" quite some content here is that the readability of the GCP doc is not very beginner friendly. But agreed it's a bit out-of-scope.

@maqiuyujoyce FWIW, I'd suggest you just point to the GCP doc

janetkuo · 2026-02-06T20:05:14Z

+
+For details on available `[OPTIONS...]`, please see the script itself.
+```shell
+./setup.sh [OPTIONS...]


Would you make it more clear what the script does, so that users know that a cluster will be created etc. From reading prereq I presume users need to configure machine types and node images manually, but it's actually done by the script.

Will add more comment / example output here.

maqiuyujoyce

+1, instead of having a single place to share all Kata instructions in this repo, we should reference official Kata docs so that Kata specific content stays current. We only document the "using Kata" part in Agent Sandbox docs in this repo.

Thank you for the feedback, @janetkuo ! Sorry it took a while for me to respond.

Regarding the details about installing Kata, the official Kata docs weren't super helpful when I worked on it, that's why I felt the need to have a Agent Sandbox + GKE + Kata tutorial here. Right now the Kata related steps are mostly hidden in the script. Do you still think that's too much for Agent Sandbox?

maqiuyujoyce · 2026-03-14T01:25:43Z

+By default, Agent Sandbox uses standard container runtimes that provide OS-level isolation where all sandboxes share the host node's kernel. This guide shows how to configure and use the Kata runtime to give each sandbox its own dedicated kernel, providing stronger, hardware-virtualized isolation. This is a common requirement for running highly sensitive or untrusted workloads.
+
+## Prerequisites
+


Yes. Do you suggest I update this whole section to be a reference to the GCP doc or keep it as is? I'm open to either ways.

The only reason I "copied" quite some content here is that the readability of the GCP doc is not very beginner friendly. But agreed it's a bit out-of-scope.

maqiuyujoyce · 2026-03-14T01:27:09Z

+    gcloud services enable container.googleapis.com
+    ```
+3.  [Ensure that your organization policy supports creating nested VMs](https://cloud.google.com/compute/docs/instances/nested-virtualization/managing-constraint#check_whether_nested_virtualization_is_allowed).
+4.  Review the nested VM [restrictions](https://cloud.google.com/compute/docs/instances/nested-virtualization/overview#restrictions) (as of Dec 2025). Kata requires specific hardware support that is not available on default GKE nodes.


Will update if we plan to keep the section as is.

maqiuyujoyce · 2026-03-14T01:28:14Z

+        *   *Prohibited:* E2 (no nested virt), AMD (N2D - nested virt not supported by GKE yet), ARM (T2A).
+    *   **OS Image:** Must be **Ubuntu** (UBUNTU_CONTAINERD).
+        *   *Prohibited:* Container-Optimized OS (COS) is read-only and blocks the installer.
+    *   **Region/Zone:** Must use a zone where N2 hardware is available (e.g., us-central1-a, us-west1-b).


Yep, I'll update.

maqiuyujoyce · 2026-03-14T01:29:06Z

+
+For details on available `[OPTIONS...]`, please see the script itself.
+```shell
+./setup.sh [OPTIONS...]


Will add more comment / example output here.

maqiuyujoyce · 2026-03-14T01:29:39Z

+spec:
+  podTemplate:
+    spec:
+      runtimeClassName: kata-qemu


Good catch! Will update.

maqiuyujoyce · 2026-03-14T01:29:53Z

+      runtimeClassName: kata-qemu
+      containers:
+      - name: hello-kata
+        image: busybox


Good point. Will update!

barney-s · 2026-04-02T19:41:29Z

@maqiuyujoyce - Happy to review again once it is updated. PTAL.

/approve

barney-s · 2026-04-02T21:04:51Z

/lgtm
/approve

lets iterate on feedback in a separate PR

k8s-ci-robot · 2026-04-02T21:05:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: barney-s, maqiuyujoyce

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~examples/OWNERS~~ [barney-s]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from janetkuo and justinsb December 30, 2025 01:04

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 30, 2025

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 30, 2025

janetkuo reviewed Jan 7, 2026

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 7, 2026

maqiuyujoyce added 2 commits January 30, 2026 10:34

Add an example of using Agent Sandbox and Kata on GKE cluster

1229d10

Fix presubmit failures

e13a3d1

Refactor example to include the Agent Sandbox instructions

823bd04

maqiuyujoyce force-pushed the 202512-kata-on-gke branch from 40994de to 823bd04 Compare February 3, 2026 00:39

janetkuo reviewed Feb 6, 2026

View reviewed changes

maqiuyujoyce commented Mar 14, 2026

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 2, 2026

k8s-ci-robot assigned barney-s Apr 2, 2026

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 2, 2026

k8s-ci-robot merged commit 9303b28 into kubernetes-sigs:main Apr 2, 2026
9 checks passed

github-actions Bot mentioned this pull request Apr 6, 2026

Pull requests report (5/04/2026 17:47) #524

Closed

github-actions Bot mentioned this pull request Apr 13, 2026

Pull requests report (12/04/2026 17:50) #580

Closed

	4. Review the nested VM [restrictions](https://cloud.google.com/compute/docs/instances/nested-virtualization/overview#restrictions) (as of Dec 2025). Kata requires specific hardware support that is not available on default GKE nodes.
	4. Review the nested VM [restrictions](https://cloud.google.com/compute/docs/instances/nested-virtualization/overview#restrictions) (as of Feb 2026). Kata requires specific hardware support that is not available on default GKE nodes.

		By default, Agent Sandbox uses standard container runtimes that provide OS-level isolation where all sandboxes share the host node's kernel. This guide shows how to configure and use the Kata runtime to give each sandbox its own dedicated kernel, providing stronger, hardware-virtualized isolation. This is a common requirement for running highly sensitive or untrusted workloads.

		## Prerequisites

Uh oh!

Conversation

maqiuyujoyce commented Dec 30, 2025

netlify Bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for agent-sandbox canceled.

k8s-ci-robot commented Dec 30, 2025

k8s-ci-robot commented Dec 30, 2025

maqiuyujoyce commented Dec 30, 2025

aditya-shantanu commented Jan 5, 2026

Choose a reason for hiding this comment

maqiuyujoyce Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zvonkok commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

janetkuo commented Jan 7, 2026

maqiuyujoyce commented Jan 31, 2026

maqiuyujoyce commented Feb 4, 2026

janetkuo commented Feb 6, 2026

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maqiuyujoyce left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barney-s commented Apr 2, 2026

barney-s commented Apr 2, 2026

k8s-ci-robot commented Apr 2, 2026

Uh oh!

Labels

7 participants

netlify Bot commented Dec 30, 2025 •

edited

Loading

maqiuyujoyce Jan 30, 2026 •

edited

Loading

zvonkok commented Jan 7, 2026 •

edited

Loading