Skip to content

Latest commit

 

History

History
110 lines (78 loc) · 6.15 KB

File metadata and controls

110 lines (78 loc) · 6.15 KB

GKE Capacity Strategy Simulation & Optimizer

This project provides a suite of tools for simulating and optimizing GKE (Google Kubernetes Engine) capacity strategies, specifically focusing on balancing latency targets with infrastructure costs using active and standby buffers.

Tools

1. GKE Capacity Strategy Simulator (index.html)

The simulator allows you to model how different buffer configurations (active vs. standby) handle bursts of traffic. It provides detailed analytics on pod startup latencies and cost implications.

  • Active Buffer: Pods ready to serve traffic immediately (100% cost).
  • Standby Buffer: Pods that can be resumed quickly (e.g., 5% cost, ~30s startup).
  • Features:
    • Real-time simulation of pod allocation.
    • Latency distribution charts (P50, P90, P95, P99).
    • Infrastructure inventory tracking over time.
    • Persistent state via URL hash for easy sharing.

2. Capacity Optimizer (optimizer.html)

The optimizer helps you find the most cost-effective buffer configuration that meets your specific latency targets.

  • Inputs: Traffic patterns, standby resume latency, and scale-up latency.
  • Targets: Define maximum acceptable latencies for P50, P90, P95, and P99 percentiles.
  • Output: Identifies the "Best" configuration and provides a cost curve of all valid candidates.

GKE Buffer Technology

This simulator models a tiered capacity strategy designed for high-scale, bursty workloads (e.g., large-scale batch processing or rapid scaling events) in Google Kubernetes Engine.

The Tiered Capacity Model

Instead of relying solely on cold starts (which are slow) or over-provisioning (which is expensive), this approach uses three distinct tiers:

  1. Active Buffer (Hot): Fully provisioned and running pods. Requests hitting this buffer experience instant (1s) startup latency. This is the most expensive tier (100% cost).
  2. Standby Buffer (Warm): Pods that are provisioned but in a "suspended" or "idled" state. When a request hits this buffer, the pod must be "resumed." This incurs a Resume Latency (typically 30s) but at a significantly reduced cost (e.g., 5% of an active pod).
  3. Scale-up (Cold): When both buffers are exhausted, GKE must provision new infrastructure. This incurs the full Scale-up Latency (typically 60s+).

How the Simulation Works

The simulation uses a discrete-event model to track pod availability, latency, and cost over time based on defined target capacities for the Active Buffer and Standby Buffer.

1. Request Handling & Latency

For every incoming pod request:

  • Active Pool: If an idle pod is available in the Active Buffer, it is used immediately (1s latency).
  • Standby Pool: If the Active Buffer is empty, a pod is pulled from the Standby Buffer (prioritizing "warm" resumed pods, then "suspended" pods). Using a suspended pod incurs Standby Resume Latency (~30s).
  • Scale-up: If both buffers are exhausted, a new pod is provisioned via Scale-up Latency (~60s).

2. Lifecycle & Backfilling Logic

The simulation maintains the target amounts for Active and Standby buffers through a continuous backfilling process:

  • Active Buffer Backfilling: When a pod is taken from the Active Buffer to serve a request, the system immediately attempts to backfill it to its target capacity. This backfill is pulled from the Standby Buffer if available, or triggers a Scale-up if the Standby Buffer is also empty.
  • Standby Buffer Backfilling: When the Standby Buffer is used, it triggers a Scale-up to replenish it back to its defined target amount.
  • Suspension Logic:
    • Active Buffer Pods: These pods are always active and never become suspended.
    • Standby Buffer Pods: Once a Scale-up is complete and the pod enters the Standby Buffer, the Idle Timeout (e.g., 5 minutes) begins. If the pod remains unused in the Standby Buffer for the duration of this timeout, it becomes suspended (reducing its cost to the standby rate).

3. Cost Calculation

Total cost is calculated as a time-weighted average based on the state of every pod in the system:

  • Active State: Pods in the Active Buffer or currently serving requests (100% cost).
  • Standby/Suspended State: Pods in the Standby Buffer that have passed the idle timeout (e.g., 5% cost).
  • Warming/Resuming State: Pods transitioning from suspended or being scaled up (typically 100% cost during the transition).

How the Optimizer Works

The Capacity Optimizer uses an advanced search algorithm to find the cheapest buffer configuration that meets your latency targets.

Contour Tracing Algorithm

Since simulating every possible combination of Active and Standby buffers on a fine grid would be too slow, the optimizer uses a Contour Tracing (or Staircase) Algorithm:

  1. Monotonicity: The system is monotonic: adding more resources can only reduce or maintain latency. This means the region of valid configurations in the (Active, Standby) space is connected and has a clear boundary.
  2. Binary Search Initialization: The algorithm starts at Active = 0 and uses binary search to find the minimum Standby size that satisfies all targets.
  3. Boundary Tracing: From the initial valid point, the algorithm traces the boundary by exploring the minimal standby for each active count, effectively tracing the Pareto front.

Getting Started

Prerequisites

Installation

npm install

Development

npm run dev

Build

npm run build

Technology Stack

Contributing

This project is licensed under the Apache 2.0 License.

We welcome contributions! Please see docs/contributing.md for more information.

We follow Google's Open Source Community Guidelines.

Disclaimer

This is not an officially supported Google product.