add an AKS example using kata container and sandbox warm pool#839
Conversation
✅ Deploy Preview for agent-sandbox canceled.
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a complete AKS + Kata (Pod Sandboxing) example demonstrating per-user “owner-aware” agent sandboxes with warm pooling and header-routed access via a shared router.
Changes:
- Introduces Kubernetes manifests for
SandboxTemplate,SandboxWarmPool,SandboxClaim, and asandbox-routerService/Deployment setup. - Adds a FastAPI-based agent container (Dockerfile + pinned Python deps) and a Go CLI client that provisions/uses sandboxes via the Go SDK.
- Adds extensive end-to-end documentation for deploying and validating the example on AKS.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| extensions/examples/kata-aks/sandboxwarmpool.yaml | Adds a warm pool resource to pre-provision Kata sandboxes for faster adoption. |
| extensions/examples/kata-aks/sandboxtemplate.yaml | Adds a Kata-pinned sandbox template with resource sizing and a managed NetworkPolicy. |
| extensions/examples/kata-aks/sandboxclaim.yaml | Adds a user-facing claim example that adopts from the warm pool. |
| extensions/examples/kata-aks/router.yaml | Adds router Services (ClusterIP + LoadBalancer) and a Deployment to proxy to sandboxes. |
| extensions/examples/kata-aks/client/main.go | Adds a Go CLI to create/reuse sandboxes and chat via the router with routing verification. |
| extensions/examples/kata-aks/agent/requirements.txt | Adds pinned Python dependencies for the demo agent. |
| extensions/examples/kata-aks/agent/agent.py | Adds the FastAPI agent implementing per-owner chat history and OpenAI-compatible calls. |
| extensions/examples/kata-aks/agent/Dockerfile | Adds container build steps for the agent. |
| extensions/examples/kata-aks/README.md | Adds end-to-end setup, deployment, usage, and cleanup instructions for the example. |
| .gitignore | Ignores .vscode workspace settings. |
d4c26e4 to
6d04696
Compare
| The agent reads three values from a `Secret` named | ||
| `model-endpoint` in the same namespace as the template: | ||
|
|
||
| | Key | Example (Azure OpenAI) | Example (OpenAI) | | ||
| | --- | --- | --- | | ||
| | `MODEL_BASE_URL` | `https://<resource>.openai.azure.com/openai/v1/` | `https://api.openai.com/v1` | | ||
| | `MODEL_API_KEY` | Azure OpenAI key | OpenAI API key | | ||
| | `MODEL_NAME` | `gpt-4o` (deployment name) | `gpt-4o` (model name) | |
Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>
be3e058 to
6f3832b
Compare
6f3832b to
09bf4f2
Compare
| The agent reads three values from a `Secret` named | ||
| `model-endpoint` in the same namespace as the template: | ||
|
|
||
| | Key | Example (Azure OpenAI) | Example (OpenAI) | | ||
| | --- | --- | --- | | ||
| | `MODEL_BASE_URL` | `https://<resource>.openai.azure.com/openai/v1/` | `https://api.openai.com/v1` | | ||
| | `MODEL_API_KEY` | Azure OpenAI key | OpenAI API key | | ||
| | `MODEL_NAME` | `gpt-4o` (deployment name) | `gpt-4o` (model name) | |
|
Pushed Why: sandbox-router #755 added a runtime check that rejects startup unless either What changed:
Verified on AKS ( cc reviewers — only the two files in this commit are new since the last review pass. |
|
Pushed Before: Step 1 instructed Resolution: aligned the README to the manifests + code (vs. the other direction) because the cleanup section already deletes |
| req, err := http.NewRequestWithContext(ctx, http.MethodPost, routerBaseURL+"/chat", bytes.NewReader(payload)) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| req.Header.Set("Content-Type", "application/json") | ||
| req.Header.Set("X-Sandbox-ID", sandboxName) | ||
| req.Header.Set("X-Sandbox-Namespace", ns) | ||
| req.Header.Set("X-Sandbox-Port", agentPort) | ||
| req.Header.Set("X-Owner", owner) | ||
|
|
||
| resp, err := http.DefaultClient.Do(req) |
| // Two-user example for the kata-aks SandboxTemplate. | ||
| // | ||
| // Provisions one Kata-isolated agent sandbox per user in parallel via | ||
| // the Go SDK, then sends each user's prompt to their own sandbox through | ||
| // the shared sandbox-router. Each user lands on a separate Kata VM; the | ||
| // per-template NetworkPolicy (where the CNI enforces it) keeps the two | ||
| // pods from talking to each other. |
Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>
68093da to
c8898e0
Compare
Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>
| func claimCachePath(owner string) string { | ||
| return filepath.Join(os.TempDir(), fmt.Sprintf("kata-aks-client-%s.claim", owner)) | ||
| } |
| if err != nil { | ||
| if errors.Is(err, sandbox.ErrSandboxDeleted) || strings.Contains(err.Error(), "not found") { | ||
| log.Printf("[%s] cached claim=%s is gone, provisioning a new one", owner, cached) | ||
| clearCachedClaim(owner) | ||
| sb = nil | ||
| } else { | ||
| log.Printf("[%s] get cached claim=%s failed: %v", owner, cached, err) | ||
| return | ||
| } |
| containers: | ||
| - name: router | ||
| image: ${ROUTER_IMAGE} | ||
| # imagePullPolicy: Never # Uncomment when loading a local image into kind/minikube. | ||
| env: | ||
| - name: PROXY_TIMEOUT_SECONDS | ||
| value: "180" | ||
| # DEMO ONLY. The router fronts a public LoadBalancer below; in any | ||
| # real deployment, replace this with ROUTER_AUTH_TOKEN sourced from | ||
| # a Secret and have callers send `Authorization: Bearer <token>`. | ||
| # See clients/python/agentic-sandbox-client/sandbox-router/sandbox_router.yaml | ||
| # for the production-shaped manifest. | ||
| - name: ALLOW_UNAUTHENTICATED_ROUTER | ||
| value: "true" |
| securityContext: | ||
| runAsUser: 1000 | ||
| runAsGroup: 1000 |
| _HISTORY_TURNS = 20 # user+assistant message pairs per owner | ||
| _history: dict[str, deque] = defaultdict(lambda: deque(maxlen=_HISTORY_TURNS * 2)) | ||
| _history_lock = Lock() |
| with _history_lock: | ||
| prior = list(_history[x_owner]) |
| with _history_lock: | ||
| h = _history[x_owner] | ||
| h.append({"role": "user", "content": body.prompt}) | ||
| h.append({"role": "assistant", "content": reply}) |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: janetkuo, ryanzhang-oss The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…etes-sigs#839) * add an AKS example using kata container and sandbox warm pool Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com> * address the comment and move to v1beta1 Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com> * address comments Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com> --------- Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>
…etes-sigs#839) * add an AKS example using kata container and sandbox warm pool Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com> * address the comment and move to v1beta1 Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com> * address comments Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com> --------- Signed-off-by: Ryan Zhang <yangzhangrice@hotmail.com>
What this PR does / why we need it:
This pull request introduces a complete example for running owner-aware AI agents inside AKS Pod Sandboxing (Kata Containers) micro-VMs, including all necessary Kubernetes manifests, a FastAPI-based agent, and a Go client for end-to-end testing. The design demonstrates strong per-user isolation, dynamic per-owner identity via HTTP headers, and secure network policies. The most significant changes are grouped below.
Which issue(s) this PR is related to:
Just add an example
Release Note
Kubernetes Manifests for AKS Pod Sandboxing:
SandboxTemplatemanifest for running the agent inside a Kata Containers (Hyper-V) micro-VM, with strict resource requests/limits, environment variable injection from secrets, and a managedNetworkPolicythat only allows ingress from the router and egress to DNS/HTTPS. (sandboxtemplate.yaml, extensions/examples/kata-aks/sandboxtemplate.yamlR1-R145)SandboxClaimmanifest for users to claim isolated sandboxes from a warm pool, with optional lifecycle management. (sandboxclaim.yaml, extensions/examples/kata-aks/sandboxclaim.yamlR1-R25)Sandbox Router and Networking:
sandbox-router, a reverse proxy that routes requests to the correct per-user sandbox pod based on HTTP headers. Includes both an internal ClusterIP and an external LoadBalancer service, as well as deployment and resource settings. (router.yaml, extensions/examples/kata-aks/router.yamlR1-R103)Agent Implementation and Packaging:
agent.py) that serves as a per-owner AI assistant, using Azure Foundry's OpenAI-compatible endpoint. The agent maintains in-memory, per-user chat history, enforces owner identity in responses, and exposes/chat,/reset, and/healthzendpoints. (agent/agent.py, extensions/examples/kata-aks/agent/agent.pyR1-R83)Dockerfileandrequirements.txtfor packaging and deploying the agent in a minimal Python container. (agent/Dockerfile, [1];agent/requirements.txt, [2]End-to-End Client Example:
client/main.go) that provisions sandboxes, sends prompts to the agent via the router, verifies correct per-owner routing in responses, and supports claim reuse, chat history reset, and cleanup. (client/main.go, extensions/examples/kata-aks/client/main.goR1-R315)