Skip to content

feat: Add an example of using Agent Sandbox and Kata on GKE cluster#230

Merged
k8s-ci-robot merged 3 commits into
kubernetes-sigs:mainfrom
maqiuyujoyce:202512-kata-on-gke
Apr 2, 2026
Merged

feat: Add an example of using Agent Sandbox and Kata on GKE cluster#230
k8s-ci-robot merged 3 commits into
kubernetes-sigs:mainfrom
maqiuyujoyce:202512-kata-on-gke

Conversation

@maqiuyujoyce

Copy link
Copy Markdown
Contributor

Fixes #176.

This PR added instructions to install Kata on a cluster cluster and use Kata container as the agent runtime.

I verified locally that the script and the doc works.

@netlify

netlify Bot commented Dec 30, 2025

Copy link
Copy Markdown

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit 823bd04
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/698143d9161b7a0007a03b06
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Welcome @maqiuyujoyce!

It looks like this is your first PR to kubernetes-sigs/agent-sandbox 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/agent-sandbox has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 30, 2025
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @maqiuyujoyce. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 30, 2025
@maqiuyujoyce

Copy link
Copy Markdown
Contributor Author

@zvonkok FYI!

@aditya-shantanu

Copy link
Copy Markdown
Collaborator

This is great.

I'd recommend cleaning up this to make sure there is a single place we can share all Kata instructions. Thoughts ?

Comment thread examples/kata-on-gke/README.md Outdated
@@ -0,0 +1,84 @@
# Enabling Kata Containers on GKE

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear how this example relates to Agent Sandbox. Would you clarify that?

@maqiuyujoyce maqiuyujoyce Jan 30, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, @janetkuo ! Sorry it took a while for me to get back to the PR.

IIUC, a key design goal of OSS Agent Sandbox is flexibility and support for multiple virtualization engines, e.g. Kata. I think this is still true, but let me know if the focus has been shifted.

By providing a concrete example with Kata Containers, we show that Agent Sandbox is not locked into a single virtualization technology. And it could help attract users with security-sensitive needs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your understanding is correct, and we already have agent sandbox examples that uses Kata Containers.

However, my question is, how does this example "relate to Agent Sandbox". Agent Sandbox isn't mentioned or used in this example.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha! Added the Agent Sandbox steps into the example!

@zvonkok

zvonkok commented Jan 7, 2026

Copy link
Copy Markdown

The guide targets nested VMs; sooner or later, we need to have instructions for bare-metal machines as well. Future use cases with accelerators do not work with CSP nested virtualization offerings.
We can leverage Kata Peer Pods if we need accelerator support on CSP that offer only VMs and no bare-metal machines are available.

@janetkuo

janetkuo commented Jan 7, 2026

Copy link
Copy Markdown
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 7, 2026
@maqiuyujoyce

Copy link
Copy Markdown
Contributor Author

The guide targets nested VMs; sooner or later, we need to have instructions for bare-metal machines as well. Future use cases with accelerators do not work with CSP nested virtualization offerings. We can leverage Kata Peer Pods if we need accelerator support on CSP that offer only VMs and no bare-metal machines are available.

This is an excellent point! Thank you for bringing up these advanced use cases.

My intention with this PR is to provide an accessible entry point for users to get started with Kata on a widely-used platform like GKE.

I think adding support for bare-metal and accelerators is a great future direction. I can file a follow-up issue to track this (if there isn't one already) so the idea doesn't get lost.

@maqiuyujoyce

Copy link
Copy Markdown
Contributor Author

This is great.

I'd recommend cleaning up this to make sure there is a single place we can share all Kata instructions. Thoughts ?

This is a good point, @aditya-shantanu, and I agree with the principle of having a single source of truth.

I gave this some thought, and my main concern is that the target users and environments for these two guides are quite different. The vscode-sandbox guide is for a quick, local setup on Minikube, while this PR targets a more production-like setup on GKE, with its own specific prerequisites (like IAM and machine types).
Combining them could make the document long and potentially confusing for users who just want the specific steps for their environment. It might be clearer to keep them in separate docs.

How about I add a note and a link at the relevant section of both guides, so that users are aware of the alternative and can easily navigate to the one that fits their needs?

@janetkuo

janetkuo commented Feb 6, 2026

Copy link
Copy Markdown
Member

+1, instead of having a single place to share all Kata instructions in this repo, we should reference official Kata docs so that Kata specific content stays current. We only document the "using Kata" part in Agent Sandbox docs in this repo.

spec:
podTemplate:
spec:
runtimeClassName: kata-qemu

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtimeClassName is hardcoded here even though setup.sh allows customization via RUNTIME_CLASS_NAME. This mismatch could cause confusing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Will update.

runtimeClassName: kata-qemu
containers:
- name: hello-kata
image: busybox

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pin to a specific tag to ensure reproducibility

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will update!

* *Prohibited:* E2 (no nested virt), AMD (N2D - nested virt not supported by GKE yet), ARM (T2A).
* **OS Image:** Must be **Ubuntu** (UBUNTU_CONTAINERD).
* *Prohibited:* Container-Optimized OS (COS) is read-only and blocks the installer.
* **Region/Zone:** Must use a zone where N2 hardware is available (e.g., us-central1-a, us-west1-b).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this part (L18-22), is there a doc we can reference in GKE, so that we don't need to maintain it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll update.

gcloud services enable container.googleapis.com
```
3. [Ensure that your organization policy supports creating nested VMs](https://cloud.google.com/compute/docs/instances/nested-virtualization/managing-constraint#check_whether_nested_virtualization_is_allowed).
4. Review the nested VM [restrictions](https://cloud.google.com/compute/docs/instances/nested-virtualization/overview#restrictions) (as of Dec 2025). Kata requires specific hardware support that is not available on default GKE nodes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could update to the current date, once you've checked it's up-to-date.

Suggested change
4. Review the nested VM [restrictions](https://cloud.google.com/compute/docs/instances/nested-virtualization/overview#restrictions) (as of Dec 2025). Kata requires specific hardware support that is not available on default GKE nodes.
4. Review the nested VM [restrictions](https://cloud.google.com/compute/docs/instances/nested-virtualization/overview#restrictions) (as of Feb 2026). Kata requires specific hardware support that is not available on default GKE nodes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update if we plan to keep the section as is.

By default, Agent Sandbox uses standard container runtimes that provide OS-level isolation where all sandboxes share the host node's kernel. This guide shows how to configure and use the Kata runtime to give each sandbox its own dedicated kernel, providing stronger, hardware-virtualized isolation. This is a common requirement for running highly sensitive or untrusted workloads.

## Prerequisites

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this whole prereq section can be replaced with a reference to https://docs.cloud.google.com/kubernetes-engine/docs/how-to/nested-virtualization#before_you_begin for nested virtualization.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Do you suggest I update this whole section to be a reference to the GCP doc or keep it as is? I'm open to either ways.

The only reason I "copied" quite some content here is that the readability of the GCP doc is not very beginner friendly. But agreed it's a bit out-of-scope.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maqiuyujoyce FWIW, I'd suggest you just point to the GCP doc


For details on available `[OPTIONS...]`, please see the script itself.
```shell
./setup.sh [OPTIONS...]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you make it more clear what the script does, so that users know that a cluster will be created etc. From reading prereq I presume users need to configure machine types and node images manually, but it's actually done by the script.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add more comment / example output here.

@maqiuyujoyce maqiuyujoyce left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, instead of having a single place to share all Kata instructions in this repo, we should reference official Kata docs so that Kata specific content stays current. We only document the "using Kata" part in Agent Sandbox docs in this repo.

Thank you for the feedback, @janetkuo ! Sorry it took a while for me to respond.

Regarding the details about installing Kata, the official Kata docs weren't super helpful when I worked on it, that's why I felt the need to have a Agent Sandbox + GKE + Kata tutorial here. Right now the Kata related steps are mostly hidden in the script. Do you still think that's too much for Agent Sandbox?

By default, Agent Sandbox uses standard container runtimes that provide OS-level isolation where all sandboxes share the host node's kernel. This guide shows how to configure and use the Kata runtime to give each sandbox its own dedicated kernel, providing stronger, hardware-virtualized isolation. This is a common requirement for running highly sensitive or untrusted workloads.

## Prerequisites

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Do you suggest I update this whole section to be a reference to the GCP doc or keep it as is? I'm open to either ways.

The only reason I "copied" quite some content here is that the readability of the GCP doc is not very beginner friendly. But agreed it's a bit out-of-scope.

gcloud services enable container.googleapis.com
```
3. [Ensure that your organization policy supports creating nested VMs](https://cloud.google.com/compute/docs/instances/nested-virtualization/managing-constraint#check_whether_nested_virtualization_is_allowed).
4. Review the nested VM [restrictions](https://cloud.google.com/compute/docs/instances/nested-virtualization/overview#restrictions) (as of Dec 2025). Kata requires specific hardware support that is not available on default GKE nodes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update if we plan to keep the section as is.

* *Prohibited:* E2 (no nested virt), AMD (N2D - nested virt not supported by GKE yet), ARM (T2A).
* **OS Image:** Must be **Ubuntu** (UBUNTU_CONTAINERD).
* *Prohibited:* Container-Optimized OS (COS) is read-only and blocks the installer.
* **Region/Zone:** Must use a zone where N2 hardware is available (e.g., us-central1-a, us-west1-b).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll update.


For details on available `[OPTIONS...]`, please see the script itself.
```shell
./setup.sh [OPTIONS...]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add more comment / example output here.

spec:
podTemplate:
spec:
runtimeClassName: kata-qemu

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Will update.

runtimeClassName: kata-qemu
containers:
- name: hello-kata
image: busybox

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will update!

@barney-s

barney-s commented Apr 2, 2026

Copy link
Copy Markdown
Collaborator

@maqiuyujoyce - Happy to review again once it is updated. PTAL.

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 2, 2026
@barney-s

barney-s commented Apr 2, 2026

Copy link
Copy Markdown
Collaborator

/lgtm
/approve

lets iterate on feedback in a separate PR

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 2, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: barney-s, maqiuyujoyce

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 9303b28 into kubernetes-sigs:main Apr 2, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

7 participants