Skip to content

add an AKS example using kata container and sandbox warm pool#839

Merged
k8s-ci-robot merged 3 commits into
kubernetes-sigs:mainfrom
ryanzhang-oss:kaka-warmpool-example
Jun 4, 2026
Merged

add an AKS example using kata container and sandbox warm pool#839
k8s-ci-robot merged 3 commits into
kubernetes-sigs:mainfrom
ryanzhang-oss:kaka-warmpool-example

Conversation

@ryanzhang-oss

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

This pull request introduces a complete example for running owner-aware AI agents inside AKS Pod Sandboxing (Kata Containers) micro-VMs, including all necessary Kubernetes manifests, a FastAPI-based agent, and a Go client for end-to-end testing. The design demonstrates strong per-user isolation, dynamic per-owner identity via HTTP headers, and secure network policies. The most significant changes are grouped below.

Which issue(s) this PR is related to:

Just add an example

Release Note

Kubernetes Manifests for AKS Pod Sandboxing:

  • Added a SandboxTemplate manifest for running the agent inside a Kata Containers (Hyper-V) micro-VM, with strict resource requests/limits, environment variable injection from secrets, and a managed NetworkPolicy that only allows ingress from the router and egress to DNS/HTTPS. (sandboxtemplate.yaml, extensions/examples/kata-aks/sandboxtemplate.yamlR1-R145)
  • Provided a SandboxClaim manifest for users to claim isolated sandboxes from a warm pool, with optional lifecycle management. (sandboxclaim.yaml, extensions/examples/kata-aks/sandboxclaim.yamlR1-R25)

Sandbox Router and Networking:

  • Added a manifest for the sandbox-router, a reverse proxy that routes requests to the correct per-user sandbox pod based on HTTP headers. Includes both an internal ClusterIP and an external LoadBalancer service, as well as deployment and resource settings. (router.yaml, extensions/examples/kata-aks/router.yamlR1-R103)

Agent Implementation and Packaging:

  • Added a new FastAPI-based agent (agent.py) that serves as a per-owner AI assistant, using Azure Foundry's OpenAI-compatible endpoint. The agent maintains in-memory, per-user chat history, enforces owner identity in responses, and exposes /chat, /reset, and /healthz endpoints. (agent/agent.py, extensions/examples/kata-aks/agent/agent.pyR1-R83)
  • Introduced a corresponding Dockerfile and requirements.txt for packaging and deploying the agent in a minimal Python container. (agent/Dockerfile, [1]; agent/requirements.txt, [2]

End-to-End Client Example:

  • Introduced a Go client (client/main.go) that provisions sandboxes, sends prompts to the agent via the router, verifies correct per-owner routing in responses, and supports claim reuse, chat history reset, and cleanup. (client/main.go, extensions/examples/kata-aks/client/main.goR1-R315)
Copilot AI review requested due to automatic review settings May 21, 2026 00:21
@netlify

netlify Bot commented May 21, 2026

Copy link
Copy Markdown

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit 70ef227
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/6a1f7e3b92b33a0008d4b9f4
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 21, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a complete AKS + Kata (Pod Sandboxing) example demonstrating per-user “owner-aware” agent sandboxes with warm pooling and header-routed access via a shared router.

Changes:

  • Introduces Kubernetes manifests for SandboxTemplate, SandboxWarmPool, SandboxClaim, and a sandbox-router Service/Deployment setup.
  • Adds a FastAPI-based agent container (Dockerfile + pinned Python deps) and a Go CLI client that provisions/uses sandboxes via the Go SDK.
  • Adds extensive end-to-end documentation for deploying and validating the example on AKS.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
extensions/examples/kata-aks/sandboxwarmpool.yaml Adds a warm pool resource to pre-provision Kata sandboxes for faster adoption.
extensions/examples/kata-aks/sandboxtemplate.yaml Adds a Kata-pinned sandbox template with resource sizing and a managed NetworkPolicy.
extensions/examples/kata-aks/sandboxclaim.yaml Adds a user-facing claim example that adopts from the warm pool.
extensions/examples/kata-aks/router.yaml Adds router Services (ClusterIP + LoadBalancer) and a Deployment to proxy to sandboxes.
extensions/examples/kata-aks/client/main.go Adds a Go CLI to create/reuse sandboxes and chat via the router with routing verification.
extensions/examples/kata-aks/agent/requirements.txt Adds pinned Python dependencies for the demo agent.
extensions/examples/kata-aks/agent/agent.py Adds the FastAPI agent implementing per-owner chat history and OpenAI-compatible calls.
extensions/examples/kata-aks/agent/Dockerfile Adds container build steps for the agent.
extensions/examples/kata-aks/README.md Adds end-to-end setup, deployment, usage, and cleanup instructions for the example.
.gitignore Ignores .vscode workspace settings.
Comment thread extensions/examples/kata-aks/client/main.go Outdated
Comment thread extensions/examples/kata-aks/sandboxtemplate.yaml Outdated
Comment thread extensions/examples/kata-aks/sandboxtemplate.yaml
Comment thread extensions/examples/kata-aks/client/main.go
Copilot AI review requested due to automatic review settings May 21, 2026 23:37
@linux-foundation-easycla

linux-foundation-easycla Bot commented May 21, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 21, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Comment thread extensions/examples/kata-aks/agent/agent.py Outdated
Comment thread extensions/examples/kata-aks/agent/agent.py
Comment thread extensions/examples/kata-aks/agent/Dockerfile Outdated
Comment thread extensions/examples/kata-aks/router.yaml
Comment thread extensions/examples/kata-aks/client/main.go
Copilot AI review requested due to automatic review settings May 22, 2026 18:24

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Comment thread extensions/examples/kata-aks/sandboxtemplate.yaml
Comment thread extensions/examples/kata-aks/README.md
Comment thread extensions/examples/kata-aks/client/main.go Outdated
Comment thread extensions/examples/kata-aks/client/main.go
Comment thread extensions/examples/kata-aks/agent/agent.py
@ryanzhang-oss ryanzhang-oss requested a review from Copilot May 22, 2026 18:33

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Comment thread extensions/examples/kata-aks/client/main.go
Comment thread extensions/examples/kata-aks/agent/agent.py
Comment thread extensions/examples/kata-aks/sandboxwarmpool.yaml Outdated
Copilot AI review requested due to automatic review settings May 22, 2026 18:51

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Comment thread extensions/examples/kata-aks/client/main.go Outdated
Comment thread extensions/examples/kata-aks/client/main.go
Comment thread extensions/examples/kata-aks/agent/Dockerfile Outdated
Copilot AI review requested due to automatic review settings May 22, 2026 19:28

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Comment thread extensions/examples/kata-aks/agent/agent.py
Comment thread extensions/examples/kata-aks/agent/agent.py
Comment thread extensions/examples/kata-aks/client/main.go
Comment thread extensions/examples/kata-aks/router.yaml
Comment thread extensions/examples/kata-aks/README.md
@ryanzhang-oss ryanzhang-oss force-pushed the kaka-warmpool-example branch from d4c26e4 to 6d04696 Compare May 22, 2026 20:31
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 22, 2026
Comment thread extensions/examples/kata-aks/agent/agent.py
Comment thread extensions/examples/kata-aks/README.md
Copilot AI review requested due to automatic review settings June 2, 2026 21:11

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Comment thread extensions/examples/kata-aks/README.md Outdated
Comment on lines +148 to +155
The agent reads three values from a `Secret` named
`model-endpoint` in the same namespace as the template:

| Key | Example (Azure OpenAI) | Example (OpenAI) |
| --- | --- | --- |
| `MODEL_BASE_URL` | `https://<resource>.openai.azure.com/openai/v1/` | `https://api.openai.com/v1` |
| `MODEL_API_KEY` | Azure OpenAI key | OpenAI API key |
| `MODEL_NAME` | `gpt-4o` (deployment name) | `gpt-4o` (model name) |
Comment thread extensions/examples/kata-aks/client/main.go
Comment thread extensions/examples/kata-aks/client/main.go
Comment thread extensions/examples/kata-aks/agent/agent.py
Comment thread extensions/examples/kata-aks/router.yaml
Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>
@ryanzhang-oss ryanzhang-oss force-pushed the kaka-warmpool-example branch from be3e058 to 6f3832b Compare June 2, 2026 22:13
Copilot AI review requested due to automatic review settings June 2, 2026 22:24
@ryanzhang-oss ryanzhang-oss force-pushed the kaka-warmpool-example branch from 6f3832b to 09bf4f2 Compare June 2, 2026 22:24

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.

Comment thread extensions/examples/kata-aks/README.md Outdated
Comment on lines +148 to +155
The agent reads three values from a `Secret` named
`model-endpoint` in the same namespace as the template:

| Key | Example (Azure OpenAI) | Example (OpenAI) |
| --- | --- | --- |
| `MODEL_BASE_URL` | `https://<resource>.openai.azure.com/openai/v1/` | `https://api.openai.com/v1` |
| `MODEL_API_KEY` | Azure OpenAI key | OpenAI API key |
| `MODEL_NAME` | `gpt-4o` (deployment name) | `gpt-4o` (model name) |
@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 2, 2026
@ryanzhang-oss

Copy link
Copy Markdown
Contributor Author

Pushed 648492e to make the example self-contained against the current upstream router image.

Why: sandbox-router #755 added a runtime check that rejects startup unless either ROUTER_AUTH_TOKEN is set or ALLOW_UNAUTHENTICATED_ROUTER=true is opted into. Following the README as previously written produced a CrashLoopBackOff with RuntimeError: ROUTER_AUTH_TOKEN must be set... on the very first kubectl apply.

What changed:

  • router.yaml: set ALLOW_UNAUTHENTICATED_ROUTER: "true" on the router container with an inline comment pointing at the production-shaped manifest.
  • README.md Step 5: added a prominent demo-only callout explaining the security trade-off, the token-based pattern to copy for real deployments (clients/python/agentic-sandbox-client/sandbox-router/sandbox_router.yaml), and the current Go-SDK gap (no Authorization header plumbing yet, so swapping to token auth is a follow-up requiring SDK changes).

Verified on AKS (testmember-4): all six README steps run clean end-to-end against a freshly built v1beta1 controller — warm pool 3/3 Ready, router pods 2/2 Ready behind the public LB, curl + Go SDK (alice one-shot, bob one-shot, ryan 3-turn reuse) all return the expected agent replies, pool refills back to 3 with zero orphan claims.

cc reviewers — only the two files in this commit are new since the last review pass.

Copilot AI review requested due to automatic review settings June 2, 2026 23:26
@ryanzhang-oss

Copy link
Copy Markdown
Contributor Author

Pushed 68093da to fix a README/manifest mismatch that would crash the agent at startup for anyone following the README literally.

Before: Step 1 instructed kubectl create secret generic model-endpoint --from-literal=MODEL_BASE_URL=... --from-literal=MODEL_API_KEY=... --from-literal=MODEL_NAME=..., but sandboxtemplate.yaml mounts secretKeyRef from a Secret named azure-foundry with keys OPENAI_BASE_URL/OPENAI_API_KEY/LLM_MODEL, and agent/agent.py reads exactly those env var names. Net effect: the Secret the README told you to create was never consumed, all three env vars came up unset, and the agent crashed in its startup assertion.

Resolution: aligned the README to the manifests + code (vs. the other direction) because the cleanup section already deletes azure-foundry and the manifest/code names are the runtime contract. Only the Step 1 prose, table, kubectl create secret snippet, and the "Switching providers" paragraph changed; no manifest/code edits needed.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Comment thread extensions/examples/kata-aks/router.yaml
Comment on lines +88 to +98
req, err := http.NewRequestWithContext(ctx, http.MethodPost, routerBaseURL+"/chat", bytes.NewReader(payload))
if err != nil {
return nil, err
}
req.Header.Set("Content-Type", "application/json")
req.Header.Set("X-Sandbox-ID", sandboxName)
req.Header.Set("X-Sandbox-Namespace", ns)
req.Header.Set("X-Sandbox-Port", agentPort)
req.Header.Set("X-Owner", owner)

resp, err := http.DefaultClient.Do(req)
Comment thread extensions/examples/kata-aks/client/main.go
Comment on lines +15 to +21
// Two-user example for the kata-aks SandboxTemplate.
//
// Provisions one Kata-isolated agent sandbox per user in parallel via
// the Go SDK, then sends each user's prompt to their own sandbox through
// the shared sandbox-router. Each user lands on a separate Kata VM; the
// per-template NetworkPolicy (where the CNI enforces it) keeps the two
// pods from talking to each other.
Comment thread extensions/examples/kata-aks/sandboxtemplate.yaml
Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>
@ryanzhang-oss ryanzhang-oss force-pushed the kaka-warmpool-example branch from 68093da to c8898e0 Compare June 3, 2026 00:41
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 3, 2026
Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>
Copilot AI review requested due to automatic review settings June 3, 2026 01:07

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.

Comment on lines +192 to +194
func claimCachePath(owner string) string {
return filepath.Join(os.TempDir(), fmt.Sprintf("kata-aks-client-%s.claim", owner))
}
Comment on lines +236 to +244
if err != nil {
if errors.Is(err, sandbox.ErrSandboxDeleted) || strings.Contains(err.Error(), "not found") {
log.Printf("[%s] cached claim=%s is gone, provisioning a new one", owner, cached)
clearCachedClaim(owner)
sb = nil
} else {
log.Printf("[%s] get cached claim=%s failed: %v", owner, cached, err)
return
}
Comment on lines +73 to +86
containers:
- name: router
image: ${ROUTER_IMAGE}
# imagePullPolicy: Never # Uncomment when loading a local image into kind/minikube.
env:
- name: PROXY_TIMEOUT_SECONDS
value: "180"
# DEMO ONLY. The router fronts a public LoadBalancer below; in any
# real deployment, replace this with ROUTER_AUTH_TOKEN sourced from
# a Secret and have callers send `Authorization: Bearer <token>`.
# See clients/python/agentic-sandbox-client/sandbox-router/sandbox_router.yaml
# for the production-shaped manifest.
- name: ALLOW_UNAUTHENTICATED_ROUTER
value: "true"
Comment on lines +108 to +110
securityContext:
runAsUser: 1000
runAsGroup: 1000
Comment on lines +54 to +56
_HISTORY_TURNS = 20 # user+assistant message pairs per owner
_history: dict[str, deque] = defaultdict(lambda: deque(maxlen=_HISTORY_TURNS * 2))
_history_lock = Lock()
Comment on lines +86 to +87
with _history_lock:
prior = list(_history[x_owner])
Comment on lines +96 to +99
with _history_lock:
h = _history[x_owner]
h.append({"role": "user", "content": body.prompt})
h.append({"role": "assistant", "content": reply})

@janetkuo janetkuo left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 4, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: janetkuo, ryanzhang-oss

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 4, 2026
@k8s-ci-robot k8s-ci-robot merged commit 8cff34d into kubernetes-sigs:main Jun 4, 2026
11 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in Agent Sandbox Jun 4, 2026
khirotaka pushed a commit to khirotaka/agent-sandbox that referenced this pull request Jun 12, 2026
…etes-sigs#839)

* add an AKS example using kata container and sandbox warm pool

Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>

* address the comment and move to v1beta1

Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>

* address comments

Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>

---------

Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>
alexatakvelon pushed a commit to volatilemolotov/agent-sandbox that referenced this pull request Jun 24, 2026
…etes-sigs#839)

* add an AKS example using kata container and sandbox warm pool

Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>

* address the comment and move to v1beta1

Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>

* address comments

Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>

---------

Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ready-for-review size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

4 participants