The Wayback Machine - https://web.archive.org/web/20161111165023/https://coreos.com/blog/

Core Blog

Updates from the CoreOS Team

CoreOS Kubernetes Community Citizenship

November 7, 2016 · By Melissa Smolensky

The CoreOS team has been an active participant in the Kubernetes project since Google began the process of open-sourcing this successor to their internal Borg and Omega systems. We not only believe Kubernetes is the right architecture for modern application infrastructure, we see it as an agent of transformation for IT organizations. We coined the acronym GIFEE – “Google Infrastructure for Everyone” – to help summarize what Kubernetes means for businesses.

With KubeCon opening tomorrow in Seattle, we’re anticipating a great event by celebrating the vibrancy of the Kubernetes community, and in this blog post we'll take a look at efforts we’ve spearheaded across Kubernetes versions.

A vibrant and growing community

The community surrounding the Kubernetes project is one of its greatest strengths. Over the past few years, that community, organized by the Cloud Native Computing Foundation, has expanded to include close to 60 organizations.

Over 1,000 contributors have made over 35,000 commits to Kubernetes. More than 900 people made significant contributions during the Kubernetes v1.4 release cycle. Kubernetes is an exciting, active open source community, and we are proud to play a part in its continued growth and increasing momentum.

CoreOS focus in the Kubernetes Community

Kubernetes takes the best patterns and lessons from its direct ancestor Google Borg, which had years of evolution in an intense production environment. We joined the community to help extend Kubernetes and build on its stability to deliver a secure, reliable, and manageable platform that any enterprise can adopt.

As part of our active commitment to the Kubernetes community, we contribute code and help foster conversation around the direction of the project. Our developers play an active role and members of our team are the project leads for:

CoreOS engineers also co-lead 5 Kubernetes Special Interest Groups (SIGS) including:

In addition, CoreOS is active in such SIGs as SIG-Apps, SIG-Cluster Lifecycle, SIG-Cluster Ops, SIG-Network and SIG-Storage.

CoreOS Contributions in each Kubernetes release

With each increment of the Kubernetes version number, we’ve worked closely with community partners, contributing significant features and improvements to get Kubernetes where it is today.

Kubernetes 1.2

  • Improved scheduling: Major performance improvements, reducing time to schedule 30,000 pods onto 1,000 nodes from 8,780 seconds to 587 seconds.

Kubernetes 1.3

  • Improved scaling: Added etcd3 v3 as a primary data store option next to etcd v2. This new API is foundational for continued Kubernetes scaling. Improved the ease of use for network layer security: with TLS bootstrap API group.
  • Made huge strides in proving out the self hosted Kubernetes model for deployment and upgrades.
  • Introduced Standards Based Authentication: CoreOS developed an OpenID Connect (OIDC) AuthProvider plugin, allowing OIDC Identity Providers (IdPs) to authenticate kubectl and other clients on behalf of the API Server.
  • Introduced API Authorization APIs alongside the wider community with the Role-Based Access Control API Authorizer.
  • Built a widely-used tool called Kube-AWS which simplifies the installation of Kubernetes on AWS, influencing AWS deployment patterns in the wider community.

Kubernetes 1.4

  • Improved network security Kubelet TLS bootstrap helps users create securely managed Kubernetes clusters with less work.
  • Improved scalability of the Kubernetes API primary datastore: etcd v3 support was finalized. In Kubernetes v1.5, the etcd v3 work done over the v1.3 and v1.4 releases will be enabled by default.
  • Improved abstractions for different Container Runtimes through our work introducing the rkt container engine as a node execution engine. This ongoing work has informed a number of important design decisions inside of Kubernetes to create a flexible and stable core.
  • Introduced Container Image Policies to prevent a container from being admitted for scheduling that does not conform to operational fitness requirements such as: using the correct base image, containing updated versions of critical libraries, or obtaining tags which verify the image has passed through a continuous integration and delivery pipeline. This work will enable tighter integration with Quay features like Container Security Scanning.

Right now we’re focused on the upcoming Kubernetes v1.5 release, and working to improve authentication, cluster lifecycle, cluster ops, instrumentation, and testing. Version 1.5 is scheduled for release in early December.

CoreOS Projects in the Kubernetes Ecosystem

In addition to features and fixes in each Kubernetes release, several projects originated at CoreOS are key components of Kubernetes, like etcd, or are essential support tooling for the orchestrator.

  • etcd - Anyone running Kubernetes is running etcd, a reliable distributed key-value store introduced by CoreOS. In Kubernetes, etcd provides the primary backing store for all cluster state and data, employing the Raft consensus algorithm to keep distributed cluster metadata correct available.
  • Prometheus - Prometheus, the second project in the CNCF, is lead by CoreOS developers Fabian Reinhartz and Frederic Branczyk. Prometheus is a monitoring and alerting system that natively consumes Kubernetes metrics and APIs.
  • Bootkube - A Kubernetes Incubator project introduced by CoreOS that can deploy self-hosted Kubernetes clusters. Installation of Kubernetes as a self-hosted system is a critical component of our vision to make Kubernetes simple to install and manage anywhere. Bootkube is informing the improvements necessary with working code and real cluster installation to make self-hosted the best deployment option for Kubernetes.
  • Localkube - As a part of the Minikube project, this is an easy way for you to get started using Kubernetes on your laptop. If you want to experiment with Kubernetes, this helps you get up and running easily.
  • Operators - A class of Kubernetes agents that represents human operational knowledge in software. These application-specific controllers extend the Kubernetes API to create, configure, and manage instances of complex stateful applications on behalf of a Kubernetes user. They build upon the basic Kubernetes resource and controller concepts, but include domain or application-specific knowledge to automate common tasks. An etcd Operator and Prometheus Operator were introduced last week.

CoreOS staff nominated for CNCF Community Award

Our team actively contributes to the Kubernetes ecosystem. In fact, four of our teammates are nominated for the first-ever Cloud Native Computing Foundation Community Award. The winner will be announced in Chris Aniszczyk’s keynote this Wednesday at KubeCon. Xiang, Fabian, Hongchao and Euan will all be on site at KubeCon this week. Drop by the CoreOS booth and say hello.

  • Xiang Li has been instrumental to the development of etcd and Kubernetes. He is the author of a Raft implementation in Go, which is the key enabler of a number of modern distributed systems, like CockroachDB, TiDB, Dgraph, Docker Swarm, and Kubernetes. He created and is a maintainer of etcd - a distributed reliable key-value store for storing critical metadata and distributed coordination.

  • Fabian Reinartz’s work on Prometheus expands the reach of the cloud native ecosystem. He has contributed immensely to Prometheus’s general development, and added features like the highly available Alertmanager and integration with Kubernetes.

  • Hongchao Deng is a prolific committer to open source cloud native projects. He has contributed a diverse group of primary features for etcd v3, Kubernetes, and on scale and performance testing of apiserver and of the Kubernetes scheduler.

  • Euan Kemp’s work mainly focuses on the rktnetes project, making the rkt container runtime a first-class citizen in Kubernetes. He has also contributed to fixing networking issues, improving testing code for various Kubernetes components (especially relating to the kubelet), and patching any rough edges as he encounters them.

Delivering Kubernetes to the enterprise with CoreOS Tectonic

Last week, CoreOS celebrated the one year anniversary of the general availability of [Tectonic](https://tectonic.com]. In Tectonic, we deliver all the innovation from the upstream Kubernetes community, and extend this with tools and utilities to ease adoption for enterprise use cases. Businesses need simple installation, management, and monitoring capabilities, along with reliable security. Tectonic delivers these features atop pure open-source Kubernetes

With Tectonic, our development philosophy is to work with the community to enable these capabilities, and allow anyone to extend Kubernetes to include them. Last week, we announced a new class of software in Kubernetes, called an Operator. This is a direct representation of our Kubernetes community philosophy. Operators are Kubernetes agents that represent human operational knowledge in software to reliably manage and scale complex applications atop Kubernetes. Tectonic works with the Operator framework to provide enterprise features to our customers.

We also align Tectonic releases with the Kubernetes release version, so that Tectonic 1.4 includes Kubernetes 1.4. We only include the pure upstream codebase – not a Kubernetes fork –so that our customers not only get all the latest innovations from the community, but also avoid lock-in to a specific vendor’s version.

Ultimately, we partner with our customers to help them succeed with Kubernetes, and with the community to help Kubernetes itself succeed. Our involvement allows us to deliver not just great tooling for the platform, but also the best customer support, deployment services, and maintenance for Kubernetes.

With Tectonic, that support is delivered by some of the Kubernetes community’s most active producers, not just packagers of the software.

KubeCon - Celebrate with the community

We will celebrate the amazing growth and success of the Kubernetes community at KubeCon this week. The first KubeCon in November 2015 had close to 500 attendees. In one year the conference has doubled in size to bring in 1,000+ attendees and an even better slate of speakers and experts. Come see our talks, visit the CoreOS booth, and join us for our evening events. We have six speakers giving 10+ talks, so you'll have plenty of opportunities to learn more from CoreOS Kubernetes experts.

Tectonic Summit: The enterprise Kubernetes event

If we don’t see you at KubeCon, join us at Tectonic Summit, CoreOS’s enterprise Kubernetes event happening in New York City on December 12 and 13.

Introducing Operators: Putting Operational Knowledge into Software

November 3, 2016 · By Brandon Philips

A Site Reliability Engineer (SRE) is a person that operates an application by writing software. They are an engineer, a developer, who knows how to develop software specifically for a particular application domain. The resulting piece of software has an application's operational domain knowledge programmed into it.

Our team has been busy in the Kubernetes community designing and implementing this concept to reliably create, configure, and manage complex application instances atop Kubernetes.

We call this new class of software Operators. An Operator is an application-specific controller that extends the Kubernetes API to create, configure, and manage instances of complex stateful applications on behalf of a Kubernetes user. It builds upon the basic Kubernetes resource and controller concepts but includes domain or application-specific knowledge to automate common tasks.

Stateless is Easy, Stateful is Hard

With Kubernetes, it is relatively easy to manage and scale web apps, mobile backends, and API services right out of the box. Why? Because these applications are generally stateless, so the basic Kubernetes APIs, like Deployments, can scale and recover from failures without additional knowledge.

A larger challenge is managing stateful applications, like databases, caches, and monitoring systems. These systems require application domain knowledge to correctly scale, upgrade, and reconfigure while protecting against data loss or unavailability. We want this application-specific operational knowledge encoded into software that leverages the powerful Kubernetes abstractions to run and manage the application correctly.

An Operator is software that encodes this domain knowledge and extends the Kubernetes API through the third party resources mechanism, enabling users to create, configure, and manage applications. Like Kubernetes's built-in resources, an Operator doesn't manage just a single instance of the application, but multiple instances across the cluster.

To demonstrate the Operator concept in running code, we have two concrete examples to announce as open source projects today:

  1. The etcd Operator creates, configures, and manages etcd clusters. etcd is a reliable, distributed key-value store introduced by CoreOS for sustaining the most critical data in a distributed system, and is the primary configuration datastore of Kubernetes itself.

  2. The Prometheus Operator creates, configures, and manages Prometheus monitoring instances. Prometheus is a powerful monitoring, metrics, and alerting tool, and a Cloud Native Computing Foundation (CNCF) project supported by the CoreOS team.

How is an Operator Built?

Operators build upon two central Kubernetes concepts: Resources and Controllers. As an example, the built-in ReplicaSet resource lets users set a desired number number of Pods to run, and controllers inside Kubernetes ensure the desired state set in the ReplicaSet resource remains true by creating or removing running Pods. There are many fundamental controllers and resources in Kubernetes that work in this manner, including Services, Deployments, and Daemon Sets.

Example 1a: A single pod is running, and the user updates the desired Pod count to 3.
Example 1b: A few moments later and controllers inside of Kubernetes have created new Pods to meet the user's request.

An Operator builds upon the basic Kubernetes resource and controller concepts and adds a set of knowledge or configuration that allows the Operator to execute common application tasks. For example, when scaling an etcd cluster manually, a user has to perform a number of steps: create a DNS name for the new etcd member, launch the new etcd instance, and then use the etcd administrative tools (etcdctl member add) to tell the existing cluster about this new member. Instead with the etcd Operator a user can simply increase the etcd cluster size field by 1.

Example 2: A backup is triggered by a user with kubectl

Other examples of complex administrative tasks that an Operator might handle include safe coordination of application upgrades, configuration of backups to offsite storage, service discovery via native Kubernetes APIs, application TLS certificate configuration, and disaster recovery.

How can you create an Operator?

Operators, by their nature, are application-specific, so the hard work is going to be encoding all of the application operational domain knowledge into a reasonable configuration resource and control loop. There are some common patterns that we have found while building operators that we think are important for any application:

  1. Operators should install as a single deployment e.g. kubectl create -f https://coreos.com/operators/etcd/latest/deployment.yaml and take no additional action once installed.

  2. Operators should create a new third party type when installed into Kubernetes. A user will create new application instance using this type.

  3. Operators should leverage built-in Kubernetes primitives like Services and Replica Sets when possible to leverage well-tested and well-understood code.

  4. Operators should be backwards compatible and always understand previous versions of resources a user has created.

  5. Operators should be designed so application instances continue to run unaffected if the Operator is stopped or removed.

  6. Operators should give users the ability to declare a desired version and orchestrate application upgrades based on the desired version. Not upgrading software is a common source of operational bugs and security issues and Operators can help users more confidently address this burden.

  7. Operators should be tested against a "Chaos Monkey" test suite that simulates potential failures of Pods, configuration, and networking.

The Future of Operators

The etcd Operator and Prometheus Operator introduced by CoreOS today showcase the power of the Kubernetes platform. For the last year, we have worked alongside the wider Kubernetes community, laser-focused on making Kubernetes stable, secure, easy to manage, and quick to install.

Now, as the foundation for Kubernetes has been laid, our new focus is the system to be built on top: software that extends Kubernetes with new capabilities. We envision a future where users install Postgres Operators, Cassandra Operators, or Redis Operators on their Kubernetes clusters, and operate scalable instances of these programs as easily they deploy replicas of their stateless web applications today.

To learn more, dive into the GitHub repos, discuss on our community channels, or come talk with the CoreOS team at KubeCon on Tuesday, November 8. Don't miss my keynote on Tuesday, November 8 at 5:25 p.m. PT, where I'll cover Operators and other Kubernetes topics.

FAQ

Q: How is this different than StatefulSets (previously PetSets)?

A: StatefulSets are designed to enable support in Kubernetes for applications that require the cluster to give them "stateful resources" like static IPs and storage. Applications that need this more stateful deployment model still need Operator automation to alert and act on failure, backup, or reconfigure. So, an Operator for applications needing these deployment properties could use StatefulSets instead of leveraging ReplicaSets or Deployments.

Q: How is this different from configuration management like Puppet or Chef?

A: Containers and Kubernetes are the big differentiation that make Operators possible. With these two technologies deploying new software, coordinating distributed configuration, and checking on multi-host system state is consistent and easy using Kubernetes APIs. Operators glue these primitives together in a useful way for application consumers; it isn't just about configuration but the entire, live, application state.

Q: How is this different than Helm?

A: Helm is a tool for packaging multiple Kubernetes resources into a single package. The concept of packaging up multiple applications together and using Operators that actively manage applications are complementary. For example, traefik is a load balancer that can use etcd as its backend database. You could create a Helm Chart that deploys a traefik Deployment and etcd cluster instance together. The etcd cluster would then be deployed and managed by the etcd Operator.

Q: What if someone is new to Kubernetes? What does this mean?

A: This shouldn't change anything for new users except make it easier for them to deploy complex applications like etcd, Prometheus, and others in the future. Our recommended onboarding path for Kubernetes is still minikube, kubectl run, and then maybe start playing with the Prometheus Operator to monitor the app you deployed with kubectl run.

Q: Is the code available for etcd Operator and Prometheus Operator today?

A: Yes! They can be found on GitHub at https://github.com/coreos/etcd-operator and https://github.com/coreos/prometheus-operator.

Q: Do you have plans for other Operators?

A: Yes, that is likely in the future. We would also love to see new Operators get built by the community as well. Let us know what other Operators you would like to see built next.

Q: How do Operators help secure a cluster?

A: Not upgrading software is a common source of operational bugs and security issues and Operators can help users more confidently address the burden of doing a correct upgrade.

Q: Can Operators help with disaster recovery?

A: Operators can make it easy to periodically back up application state and recover previous state from the backup. A feature we hope will become common with Operators is easily enabling users to deploy new instances from backups.

Introducing the etcd Operator: Simplify etcd cluster configuration and management

November 3, 2016 · By Hongchao Deng

Today, CoreOS introduced a new class of software in the Kubernetes community called an Operator. An Operator builds upon the basic Kubernetes resource and controller concepts but includes application domain knowledge to take care of common tasks. They reduce the complexity of running distributed systems and help you focus on the desired configuration, not the details of manual deployment and lifecycle management.

etcd is a distributed key-value store. In fact, etcd is the primary datastore of Kubernetes; storing and replicating all Kubernetes cluster state. As a critical component of a Kubernetes cluster having a reliable automated approach to its configuration and management is imperative.

As a distributed consensus-based system, the cluster configuration of etcd can be complicated. Bootstrapping, maintaining quorum, reconfiguring cluster membership, creating backups, handling disaster recovery, and monitoring critical events are tedious work, and require etcd-specific expertise.

Today we are introducing the etcd Operator and the Prometheus Operator showing how to make applications like these easier to run on Kubernetes. In this post, we'll outline the importance of an Operator for etcd. Let's dive in.

The etcd Operator: The best way to manage etcd clusters

The etcd Operator is simple to install with a single command line, and enables users to configure and manage the complexities of etcd using simple declarative configuration that will create, configure, and manage etcd clusters.

The etcd Operator provides the following features:

  • Create/Destroy: Instead of specifying tedious configuration settings for each etcd member, users only need to specify the size of the cluster minimally.

  • Resize: Users need only to modify the size in spec, and the etcd Operator will take care of deploying, destroying, and/or re-configuring cluster members, e.g. from 3 to 5, or from 5 to 3.

  • Backup: The etcd Operator performs backups automatically and transparently. Users need only to specify the backup policy, for example, to backup every 30 minutes and keep the last 3 backups.

  • Upgrade: Upgrading etcd without downtime is a critical but difficult task. Doing it with the etcd Operator not only simplifies operations, but also avoids common upgrade pitfalls and errors.

How it works

The etcd Operator simulates human operator behaviors in three steps: Observe, Analyze, and Act.

First, it observes the current cluster state by using the Kubernetes API. Second, it finds the differences between the desired state and current state. Last, it fixes the difference through one or both of the etcd cluster management API or the Kubernetes API.

etcd Operator logic loop in action

For example, let's say we have an etcd cluster of 3 members. Unfortunately, one member is down. The etcd Operator observes that the current cluster has 2 running pods. It diffs against the desired state, which should have 3 members. The Operator then acts to recover one member, by removing the dead one and adding a new one. Now the etcd cluster is back to a healthy state.

Testing the etcd Operator with "Chaos Monkey"

It is important to ensure the Operator is robust. We developed a tool similar to Netflix's Chaos Monkey that can kill pods randomly.

We use this tool to test the major Operator features: that is, to create, recover, and backup etcd clusters. In our continuous soak testing, the Chaos Monkey is enabled. It stresses the Operator by killing random etcd pods, so that we can see how the Operator reacts in real-time.

Try it out

Deploy the etcd Operator

Creating a new etcd Operator is simple on any Kubernetes cluster. There is an example deployment manifest in the etcd Operator source repo:

$ kubectl create -f https://coreos.com/operators/etcd/latest/deployment.yaml

This command creates a etcd Operator deployment on the Kubernetes cluster. The etcd Operator is now ready to manage etcd clusters.

Create a new etcd cluster with the Operator

Now, we'll create a 3-member etcd cluster with backup support. (Note that backup only works if your Kubernetes cluster supports Persistent Volumes). Once again, we'll use an example manifest from the etcd Operator repo:

$ kubectl create -f https://coreos.com/operators/etcd/latest/example-etcd-cluster.yaml
$ kubectl get pods
NAME                             READY     STATUS    RESTARTS   AGE
etcd-cluster-0000                1/1       Running   0          23s
etcd-cluster-0001                1/1       Running   0          16s
etcd-cluster-0002                1/1       Running   0          8s
etcd-cluster-backup-tool-rhygq   1/1       Running   0          18s

Interested? To experiment with more examples and explore more features, check out the etcd Operator documentation.

The etcd Operator is under active development. A lot of exciting features are planned and being developed. We'd love to see your feedback and contributions!

Join CoreOS at KubeCon

We're hosting a number of events at the Kubernetes conference, KubeCon in Seattle, November 8 and 9, 2016. Watch a keynote with Brandon Philips for more details on the etcd Operator on Wednesday, November 9 at 3:50 p.m. PT. Check out the full schedule of CoreOS KubeCon events, stop by and visit our engineers at the CoreOS booth with your Kubernetes and container questions, or request an on-site sales meeting with a specialist.

The Prometheus Operator: Managed Prometheus setups for Kubernetes

November 3, 2016 · By Fabian Reinartz

Today, CoreOS introduced a new class of software called Operators and are also introducing two Operators as open source projects, one for etcd and another for Prometheus. In this post, we'll outline the importance of an Operator for Prometheus, the monitoring system for Kubernetes.

An Operator builds upon the basic Kubernetes resource and controller concepts but includes application domain knowledge to take care of common tasks. They ultimately help you focus on a desired configuration, not the details of manual deployment and lifecycle management.

Prometheus is a close cousin of Kubernetes: Google introduced Kubernetes as an open source descendent of their Borg cluster system and Prometheus shares fundamental design concepts with Borgmon, the monitoring system paired with Borg. Today, both Prometheus and Kubernetes are governed by the Cloud Native Computing Foundation (CNCF). And at a technical level Kubernetes exports all of its internal metrics in the native Prometheus format.

The Prometheus Operator: The best way to integrate Kubernetes and Prometheus

The Prometheus Operator is simple to install with a single command line, and enables users to configure and manage instances of Prometheus using simple declarative configuration that will, in response, create, configure, and manage Prometheus monitoring instances.

Once installed the Prometheus Operator provides the following features:

  • Create/Destroy Easily launch a Prometheus instance for your Kubernetes namespace, a specific application or team easily using the Operator.

  • Simple Configuration: Configure the fundamentals of Prometheus like versions, persistence, retention policies, and replicas from a native Kubernetes resource.

  • Target Services via Labels: Automatically generate monitoring target configurations based on familiar Kubernetes label queries; no need to learn of learning a Prometheus specific configuration language.

How it Works

The core idea of the Operator is to decouple deployment of Prometheus instances from the configuration of which entities they are monitoring. For that purpose two third party resources (TPRs) are defined: Prometheus and ServiceMonitor.

The Operator ensures at all times that for each Prometheus resource in the cluster a set of Prometheus servers with the desired configuration are running. This entails aspects like the data retention time, persistent volume claims, number of replicas, the Prometheus version, and Alertmanager instances to send alerts to. Each Prometheus instance is paired with a respective configuration that specifies which monitoring targets to scrape for metrics and with which parameters.

The user can either manually specify this configuration or let the Operator generate it based on the second TPR, the ServiceMonitor. The ServiceMonitor resource specifies how metrics can be retrieved from a set of services exposing them in a common way. A Prometheus resource object can dynamically include ServiceMonitor objects by their labels. The Operator configures the Prometheus instance to monitor all services covered by included ServiceMonitors and keeps this configuration synchronized with any changes happening in the cluster.

The Operator encapsulates a large part of the Prometheus domain knowledge and only surfaces aspects meaningful to the monitoring system's end user. It's a powerful approach that enables engineers across all teams of an organization to be autonomous and flexible in the way they run their monitoring.

Operator workflow and relationships

Prometheus Operator in Action

We are going to walk through a full demonstration of the Prometheus Operator by creating a Prometheus instance and some services to monitor. Let's start by deploying our first Prometheus instance.

First, you need a running Kubernetes cluster v1.3+ with alpha APIs enabled. If you don't have one, follow the minikube instructions to quickly get a local cluster up and running.

Note: minikube hides some components of Kubernetes, but it is the fastest way to setup a cluster to work with. For a more extensive and production-like environment have a look into setting up a cluster using bootkube.

Managed Deployments

Let's start by deploying the Prometheus Operator in our cluster:

$ kubectl create -f https://coreos.com/operators/prometheus/latest/prometheus-operator.yaml
deployment "prometheus-operator" created

Verify that it is up and running and has registered the TPR types with the Kubernetes API server.

$ kubectl get pod
NAME                                   READY     STATUS    RESTARTS   AGE
prometheus-operator-1078305193-ca4vs   1/1       Running   0          5m
$ until kubectl get prometheus; do sleep 1; done
# … wait ...
# If no more errors are printed, the TPR types were registered successfully.

A simple definition of a Prometheus TPR that deploys a single Prometheus instance looks like this:

apiVersion: monitoring.coreos.com/v1alpha1
kind: Prometheus
metadata:
  name: prometheus-k8s
  labels:
    prometheus: k8s
spec:
  version: v1.3.0

To create it in the cluster, run:

$ kubectl create -f https://coreos.com/operators/prometheus/latest/prometheus-k8s.yaml
prometheus "prometheus-k8s" created
service "prometheus-k8s" created

This also creates service to make the Prometheus UI accessible for the user. For the purpose of this demo, a service exposing it on NodePort 30900 is created.

Immediately afterwards, observe the Operator deploying a Prometheus pod:

$ kubectl get pod -w
NAME                                   READY     STATUS    RESTARTS   AGE
prometheus-k8s-0                       3/3       Running   0          2m

We can now reach the Prometheus UI by going to http://<cluster node>:30900 run $ minikube service prometheus-k8s when using minikube.

In the same manner we can easily deploy further Prometheus servers and use advanced options in our Prometheus TPR to let the Operator handle version upgrades, persistent volume claims, and connecting Prometheus to Alertmanager instances.

You can read more on the full capabilities of the managed Prometheus deployments in the repository's documentation.

Cluster Monitoring

We successfully created a managed Prometheus server. However, it is not monitoring anything yet as we did not provide any configuration. Each Prometheus deployment mounts a Kubernetes ConfigMap named after itself, i.e. our Prometheus server mounts the configuration provided in the "prometheus-k8s" ConfigMap in its namespace.

We want our Prometheus server to monitor all aspects of our cluster itself like container resource usage, cluster nodes, and kubelets. Kubernetes chose the Prometheus metric format as the canonical way to expose metrics for all its components. So, we only need to point Prometheus to the right endpoints to retrieve those metrics. This works the same across virtually any cluster and we can use the predefined manifests in our kube-prometheus repository.

# Deploy exporters providing metrics on cluster nodes and Kubernetes business logic
$ kubectl create -f https://coreos.com/operators/prometheus/latest/exporters.yaml
deployment "kube-state-metrics" created
service "kube-state-metrics" created
daemonset "node-exporter" created
service "node-exporter" created
# Create the ConfigMap containing the Prometheus configuration
$ kubectl apply -f https://coreos.com/operators/prometheus/latest/prometheus-k8s-cm.yaml
configmap "prometheus-k8s" configured

Shortly after Kubernetes will update the configuration in the Prometheus pod and we can see targets showing up on the "Targets" page. The Prometheus instance is now ingesting metrics and ready to be queried in the UI or by dashboards and to evaluate alerts.

"Targets" page of prometheus-k8s

Service Monitoring

On top of monitoring our cluster components, we also want to monitor our own services. Using the regular Prometheus configuration, we have to deal with the concept of relabeling to discover and configure monitoring targets properly. It is a powerful approach allowing Prometheus to integrate with a variety of service discovery mechanisms and arbitrary operational models. However, it is very verbose and repetitive and thus not generally suitable to be written manually.

The Prometheus Operator solves this problem by defining a second TPR to express how to monitor our custom services in a way that is fully idiomatic to Kubernetes.

Suppose all our services with the label tier = frontend serve metrics on the named port web under the standard /metrics path. The ServiceMonitor TPR allows us to declaratively express a monitoring configuration that applies to all those services, selecting them by the tier label.

apiVersion: monitoring.coreos.com/v1alpha1
kind: ServiceMonitor
metadata:
  name: frontend
  labels:
    tier: frontend
spec:
  selector:
    matchLabels:
      tier: frontend
  endpoints:
  - port: web             # works for different port numbers as long as the name matches
    interval: 10s        # scrape the endpoint every 10 seconds

This merely defines how a set of services should be monitored. We now need define Prometheus instance that includes this ServiceMonitor into its configuration. ServiceMonitors belonging to a Prometheus setup are selected, once again, based on labels. When deploying said Prometheus instance, the Operator configures it according to the matching service monitors.

apiVersion: monitoring.coreos.com/v1alpha1
kind: Prometheus
metadata:
  name: prometheus-frontend
  labels:
    prometheus: frontend
spec:
  version: v1.3.0
  # Define that all ServiceMonitor TPRs with the label `tier = frontend` should be included
  # into the server's configuration.
  serviceMonitors:
  - selector:
      matchLabels:
        tier: frontend

We create the ServiceMonitor and the Prometheus object by running:

$ kubectl create -f https://coreos.com/operators/prometheus/latest/servicemonitor-frontend.yaml
servicemonitor "frontend" created
$ kubectl create -f https://coreos.com/operators/prometheus/latest/prometheus-frontend.yaml
prometheus "prometheus-frontend" created
service "prometheus-frontend" created

Visiting http://<cluster node>:30100 (run $ minikube service prometheus-frontend when using minikube) we can see the UI of our new Prometheus server. As there's no service the ServiceMonitor applies to, the "Targets" page is still empty.

The following command deploys four instances of an example application exposing metrics as defined by our ServiceMonitor and matches its tier = frontend label selector.

$ kubectl create -f https://coreos.com/operators/prometheus/latest/example-app.yaml

Going back to the web UI, we can see the new pods immediately appearing on the "Targets" page and we can query the metrics it exposes. Service and pod labels of our example application, as well as the Kubernetes namespace, are automatically attached as labels to the scraped metrics.This allows us to aggregate and filter along them in our Prometheus queries and alerts.

"Targets" page of prometheus-frontend

Prometheus will automatically pick up new services having the tier = frontend label and adapt to their deployments scaling up and down. Additionally, the Operator will immediately reconfigure Prometheus appropriately if ServiceMonitors are added, removed, or modified.

The image below visualizes how the controller manages Prometheus deployment by watching the state of our Prometheus and ServiceMonitor resources. The relationships between the resources are expressed through labels and any changes take immediate effect at runtime.

Future Directions

With Operators introduced today we showcase the power of the Kubernetes platform. The Prometheus Operator extends the Kubernetes API with new monitoring capabilities. We have seen how the Prometheus Operator helps us with dynamically deploying Prometheus instances and managing their life cycle. Additionally, it provides a way to define custom service monitoring purely expressed in Kubernetes idioms. Monitoring truly becomes part of the cluster itself and all implementation details of a distinct system being used are abstracted away.

While it's still in an early stage of development, the Operator already handles several aspects of a Prometheus setup that are beyond the scope of this blog post, such as persistent storage, replication, alerting, and version updates. Check out the Operator's documentation to find out more. The kube-prometheus repository contains a variety of essentials to get your cluster monitoring up and running in no time. It also provides out-of-the-box dashboarding and alerting for cluster components.

Stay tuned for more features of the Prometheus Operator and additional operators to equally easily run the Prometheus Alertmanager and Grafana inside of your cluster.

Join CoreOS at KubeCon

We're hosting a number of events at the Kubernetes conference, KubeCon in Seattle, November 8 and 9, 2016. Join us, especially at the Prometheus keynote on Wednesday, November 9 at 3:30 p.m. PT, which will dive in deeper on the Prometheus Operator.

Be sure to check out the full schedule of CoreOS KubeCon events, then stop by and visit our engineers at the CoreOS booth with your Kubernetes and container questions, or request an on-site sales meeting with a specialist.

Older Posts

November community events: Meet us at KubeCon and other conferences

November 2, 2016 · By Johan Philippine

Linux kernel has been Updated (CVE-2016-5195)

October 20, 2016 · By Alex Crawford

CoreOS and Redspread Join to Extend Kubernetes

October 17, 2016 · By Alex Polvi

October community events - LinuxCon, OpenStack Summit, All Things Open, and more

October 3, 2016 · By Johan Philippine

Eliminating Delays From systemd-journald, Part 2

September 29, 2016 · By Vito Caputo

How to use pluggable isolation features in the rkt container engine

September 16, 2016 · By Derek Gonyeo

rkt Container Engine Reaches v1.14.0: Focus on Stability and Minimalism

September 9, 2016 · By Luca Bruno

September community events - meetups, recruitment, and conferences

September 2, 2016 · By Johan Philippine

Serializability and Distributed Software Transactional Memory with etcd3

August 31, 2016 · By Anthony Romano

Fetching and running docker container images with rkt

August 25, 2016 · By Derek Gonyeo

Developing Prometheus alerts for etcd

August 24, 2016 · By Frederic Branczyk

CoreOS Online Validator Now Supports Ignition

August 15, 2016 · By Andrew Jeddeloh

Announcing Public and Private Kubernetes and CoreOS Training

August 11, 2016 · By Jeff Gray

Intro to rkt signing and verification

August 10, 2016 · By Derek Gonyeo

Meet CoreOS in August: OpenStack, ContainerCon and more

August 8, 2016 · By Johan Philippine

Sharing Servers for International Friendship Day

August 7, 2016 · By Jason Luce, ScaleFT

Self-Hosted Kubernetes makes Kubernetes installs, scaleouts, upgrades easier

August 5, 2016 · By Josh Wood

August spotlight: Learn about rkt, the container engine by CoreOS

August 4, 2016 · By Derek Gonyeo

Hands on: Monitoring Kubernetes with Prometheus

August 3, 2016 · By Joe Bowers

Migrating applications, clusters, and Kubernetes to etcd v3

July 27, 2016 · By Hongchao Deng

GopherCon, ContainerCon and more! Meet CoreOS at a July event

July 11, 2016 · By Johan Philippine

Happy three years, CoreOS

July 1, 2016 · By Brandon Philips

etcd3: A new etcd

June 30, 2016 · By Anthony Romano and Xiang Li

Prometheus and Kubernetes up and running

June 27, 2016 · By Fabian Reinartz

CoreOS Linux available in China

June 16, 2016 · By Alex Crawford

Kubernetes v1.3 Preview - Auth, Scale, and Improved Install

June 7, 2016 · By Mike Saparov

June CoreOS Events

June 2, 2016 · By Johan Philippine

Presenting Torus: A modern distributed storage system by CoreOS

June 1, 2016 · By Barak Michener

Security brief: CoreOS Linux Alpha remote SSH issue

May 19, 2016 · By Matthew Garrett

Major Remote SSH Security Issue in CoreOS Linux Alpha, Subset of Users Affected

May 16, 2016 · By CoreOS Security Team

CoreOS Fest: CoreOS Works with Intel, Project Calico, Packet, and StackPointCloud to extend GIFEE

May 9, 2016 · By Alex Polvi

CoreOS closes $28M Series B to bring Google-like infrastructure to all

May 9, 2016 · By Alex Polvi

CoreOS brings open source distributed systems components to the next level

May 9, 2016 · By Brandon Philips

What to know before you go to CoreOS Fest, and other events this May

May 5, 2016 · By Johan Philippine

CoreOS and Prometheus: Building monitoring for the next generation of cluster infrastructure

April 29, 2016 · By Fabian Reinartz

Celebrating the Open Container Initiative Image Specification

April 14, 2016 · By Jonathan Boulle

Introducing Ignition: The new CoreOS machine provisioning utility

April 12, 2016 · By Alex Crawford

rkt 1.3.0: Tighter security; easier container debugging, development, and integration

April 6, 2016 · By Derek Gonyeo

Meet us for our April 2016 events

April 5, 2016 · By Johan Philippine and Kelly Tenn

CoreOS Fest Berlin and San Francisco: Join us this May

March 28, 2016 · By Melissa Smolensky

CoreOS Linux Hits Day 1000

March 28, 2016 · By Brandon Philips

CoreOS Delivers etcd v2.3.0 with Increased Stability and v3 API Preview

March 21, 2016 · By Xiang Li

CoreOS Delivers on Security with v1.0 of Clair Container Image Analyzer

March 18, 2016 · By Quentin Machu

Eliminating Delays From systemd-journald, Part 1

March 10, 2016 · By Vito Caputo

March CoreOS Events

March 7, 2016 · By Elsie Phillips

LDAP Support in CoreOS dex: An Open Source Journey

March 3, 2016 · By Frode Nordahl

Take a REST with HTTP/2, Protobufs, and Swagger

February 24, 2016 · By Brandon Philips

Improving Kubernetes Scheduler Performance

February 22, 2016 · By Hongchao Deng

rkt Network Modes and Default CNI Configurations

February 9, 2016 · By Stefan Junker

February Community Events

February 8, 2016 · By Elsie Phillips

The Security-minded Container Engine by CoreOS: rkt Hits 1.0

February 4, 2016 · By Alex Polvi

Get Started with rkt Containers in Three Minutes

February 4, 2016 · By Derek Gonyeo

OpenSSL patched in CoreOS Alpha, Beta and Stable

February 1, 2016 · By George Tankersley

NTP has been Updated

January 22, 2016 · By Alex Crawford

A Bare Metal Configuration Service for CoreOS Linux

January 22, 2016 · By Dalton Hubble

Get Ready for CoreOS Fest 2016: Berlin

January 20, 2016 · By Melissa Smolensky

Meet CoreOS In Your Neck of the Woods

January 20, 2016 · By Kelly Tenn

Linux Kernel has been Updated (CVE-2016-0728)

January 20, 2016 · By Alex Crawford

CoreOS rkt 0.15.0 Introduces rkt fly, Go 1.5 Build Support

January 19, 2016 · By Josh Wood

Go 1.5.3 Security Vulnerability Patch

January 13, 2016 · By George Tankersley

What Trusted Computing Means to Users of CoreOS and Beyond

December 10, 2015 · By Matthew Garrett

Making Sense of Container Standards and Foundations: OCI, CNCF, appc and rkt

December 8, 2015 · By Alex Polvi

Meet CoreOS in New York This Week

December 1, 2015 · By Kelly Tenn

CoreOS Introduces Clair: Open Source Vulnerability Analysis for your Containers

November 13, 2015 · By Quentin Machu

Tectonic, by CoreOS, Is GA

November 3, 2015 · By Brandon Philips

November Events for CoreOS

November 2, 2015 · By Alex Avritch

rkt v0.10.0: With a New API Service and a Better Image Build Tool

October 27, 2015 · By Alban Crequy

October Events for CoreOS

October 5, 2015 · By Alex Avritch

Official CloudFormation and kube-aws tool for installing Kubernetes on AWS

October 2, 2015 · By Brian Waldon

Container Security with SELinux and CoreOS

September 29, 2015 · By Matthew Garrett

Cross-host Container Communication with rkt and flannel

September 21, 2015 · By Eugene Yakubovich

Official Kubernetes on CoreOS Guides and Tools

September 17, 2015 · By Aaron Levy

Where systemd and Containers Meet: Q&A; with Lennart Poettering

September 16, 2015 · By Jonathan Boulle

etcd 2.2 – Improving the Developer Experience and Setting the Path for the v3 API

September 10, 2015 · By Xiang Li

September Events for CoreOS: Conferences, Trainings and More

September 8, 2015 · By Alex Avritch

Announcing dex, an Open Source OpenID Connect Identity Provider from CoreOS

September 3, 2015 · By Bobby Rullo

Flocker on CoreOS Linux

September 1, 2015 · By Brandon Philips

Containers on the Autobahn: Q&A; with Giant Swarm

August 24, 2015 · By Kelly Tenn

What it’s like to Intern with CoreOS

August 21, 2015 · By Mary O’Brien

Using Virtual Machines to Improve Container Security with rkt v0.8.0

August 18, 2015 · By Brandon Philips

Introducing the Kubernetes kubelet in CoreOS Linux

August 14, 2015 · By Kelsey Hightower

Meet the CoreOS team around the world in August

August 4, 2015 · By Kelly Tenn

Introducing etcd 2.1

July 24, 2015 · By Yicheng Qin

CoreOS and Kubernetes 1.0

July 21, 2015 · By Brandon Philips

Meet CoreOS at OSCON and more

July 17, 2015 · By Kelly Tenn

Announcing rkt v0.7.0, featuring a new build system, SELinux and more

July 15, 2015 · By Iago López Galeiras

Q&A with Sysdig on containers, monitoring and CoreOS

July 14, 2015 · By Kelsey Hightower

How to get involved with CoreOS projects

July 10, 2015 · By Jed Smith

OpenSSL has been Updated (CVE-2015-1793)

July 10, 2015 · By Alex Crawford

Happy 2nd Epoch CoreOS Linux

July 7, 2015 · By Brandon Philips

Upcoming CoreOS Events in July

July 6, 2015 · By Alex Avritch

Introducing flannel 0.5.0 with AWS and GCE

June 30, 2015 · By Mohammad Ahmad

App Container and the Open Container Project

June 22, 2015 · By Alex Polvi

Technology Preview: CoreOS Linux and xhyve

June 11, 2015 · By Brian Akins

etcd2 in the CoreOS Linux Stable channel

June 9, 2015 · By Alex Crawford

Building and deploying minimal containers on Kubernetes with Quay.io and wercker

June 3, 2015 · By Micha "mies" Hernandez van Leuffen

Oh, the places we’ll be in June

June 2, 2015 · By Kelly Tenn

CoreOS Linux is in the OpenStack App Marketplace

May 19, 2015 · By Brian Harrington

CoreOS at OpenStack Summit 2015

May 18, 2015 · By Alex Avritch

New Functional Testing in etcd

May 14, 2015 · By Yicheng Qin

Upcoming CoreOS Events in May

May 12, 2015 · By Alex Avritch

CoreOS State of the Union at CoreOS Fest

May 5, 2015 · By Brandon Philips

App Container spec gains new support as a community-led effort

May 4, 2015 · By Alex Polvi

CoreOS Fest 2015 Guide

April 29, 2015 · By Alex Avritch

Announcing GovCloud support on AWS

April 27, 2015 · By Mike Marineau

rkt 0.5.4, featuring repository authentication, port forwarding and more

April 24, 2015 · By Jonathan Boulle

VMware Ships rkt and Supports App Container Spec

April 20, 2015 · By Alex Polvi

etcd 2.0 in CoreOS Alpha Image

April 16, 2015 · By Alex Crawford

CoreOS on ARM64

April 14, 2015 · By Geoff Levand

Counting Down to CoreOS Fest on May 4 and 5

April 13, 2015 · By Kelly Tenn

Upcoming CoreOS Events in April

April 7, 2015 · By Alex Avritch

Announcing Tectonic: The Commercial Kubernetes Platform

April 6, 2015 · By Alex Polvi

Announcing rkt v0.5, featuring pods, overlayfs, and more

April 1, 2015 · By Jonathan Boulle

CoreOS Fest 2015 First Round of Speakers Announced

March 27, 2015 · By Alex Avritch

What makes a cluster a cluster?

March 20, 2015 · By Barak Michener

Announcing rkt and App Container 0.4.1

March 13, 2015 · By Brandon Philips

rkt Now Available in CoreOS Alpha Channel

March 12, 2015 · By Michael Marineau

The First CoreOS Fest

March 11, 2015 · By Melissa Smolensky

CoreOS on VMware vSphere and VMware vCloud Air

March 9, 2015 · By Kelsey Hightower

Managing CoreOS Logs with Logentries

March 5, 2015 · By Melissa Smolensky

Upcoming CoreOS Events in March

March 3, 2015 · By Kelly Tenn

App Container and Docker

February 13, 2015 · By Jonathan Boulle

Announcing rkt and App Container v0.3.1

February 6, 2015 · By Jonathan Boulle

Upcoming CoreOS Events in February

February 3, 2015 · By Kelly Tenn

etcd 2.0 Release - First Major Stable Release

January 28, 2015 · By Brandon Philips

Update on CVE-2015-0235, GHOST

January 28, 2015 · By Alex Crawford

rkt and App Container 0.2.0 Release

January 23, 2015 · By Jonathan Boulle

Meet us for our January 2015 events

January 20, 2015 · By Kelly Tenn

Quay.io New Features

January 7, 2015 · By Jacob Moshenko

Announcing the etcd 2.0 Release Candidate

December 18, 2014 · By Xiang Li

App Container Spec One Week In

December 9, 2014 · By Brandon Philips

Docker 1.3.2 in Stable Channel

December 3, 2014 · By Alex Crawford

CoreOS is building a container runtime, rkt

December 1, 2014 · By Alex Polvi

Docker 1.3.2 Rolled Out Today

November 24, 2014 · By Alex Crawford

CoreOS Brings Kubernetes to Any Cloud Platform

November 10, 2014 · By Kelsey Hightower

Weekend Enjoyment: CoreOS Deployment Videos

November 7, 2014 · By Rob Szumski

Announcing CoreOS Enterprise Registry, a secure Docker registry behind your firewall

October 30, 2014 · By Joey Schorr

A Meetup Ride to San Mateo

October 29, 2014 · By Melissa Smolensky

CoreOS Now Available On Microsoft Azure

October 20, 2014 · By Alex Crawford

Godep for End User Go Projects

October 15, 2014 · By Brandon Philips

Managing CoreOS with Ansible

October 13, 2014 · By Roman Shtylman

CoreOS Machines Secured from Shellshock

September 26, 2014 · By Alex Polvi

Security Update on CVE-2014-6371 Shellshock

September 25, 2014 · By Brandon Philips

Congrats to Interactive Markdown at the TechCrunch Disrupt Hackathon

September 8, 2014 · By Melissa Smolensky

CoreOS Image Now Available On DigitalOcean

September 5, 2014 · By Alex Crawford

Introducing flannel: An etcd backed overlay network for containers

August 28, 2014 · By Eugene Yakubovich

CoreOS Just Got Easier to Try With Panamax

August 21, 2014 · By Lucas Carlson

CoreOS Certification and Training

August 20, 2014 · By Melissa Smolensky

Quay.io joins CoreOS, Introducing the CoreOS Enterprise Registry

August 13, 2014 · By Alex Polvi

Running Kubernetes Example on CoreOS, Part 2

July 30, 2014 · By Kelsey Hightower

CoreOS Stable Release

July 25, 2014 · By Alex Polvi

Running Kubernetes Example on CoreOS, Part 1

July 10, 2014 · By Kelsey Hightower

The CoreOS Epoch

June 30, 2014 · By Brandon Philips

CoreOS Officially on Rackspace OnMetal Cloud Servers

June 19, 2014 · By Alex Crawford

The CoreOS Update Philosophy

June 18, 2014 · By Kelsey Hightower

CoreOS Videos From Our Inaugural Meetup

June 17, 2014 · By Melissa Smolensky

Docker 1.0 released to Alpha

June 16, 2014 · By Melissa Smolensky

Official CoreOS Meetup in San Francisco June 3rd, 2014

May 28, 2014 · By Brian 'redbeard' Harrington

Official CoreOS Images on Google Compute Engine

May 23, 2014 · By Brandon Philips

etcd 0.4.0 with Standby Mode

May 20, 2014 · By Yicheng Qin

Zero Downtime Frontend Deploys with Vulcand on CoreOS

May 19, 2014 · By Rob Szumski

CoreOS Beta Release

May 9, 2014 · By Alex Polvi

Clustering CoreOS with Vagrant

April 24, 2014 · By Brandon Philips

etcd - The Road to 1.0

April 14, 2014 · By Blake Mizerany

Major Update: btrfs, docker 0.9, add users, writable /etc, and more!

March 27, 2014 · By Alex Polvi

Dynamic Docker links with an ambassador powered by etcd

February 27, 2014 · By Alex Polvi

Introduction to networkd, network management from systemd

February 25, 2014 · By Tom Gundersen

Cluster-Level Container Deployment with fleet

February 18, 2014 · By Brian Waldon

etcd 0.3.0 - Improved Cluster Discovery, API Enhancements and Windows Support

February 7, 2014 · By Brandon Philips

Brandon's etcd presentation at GoSF

January 16, 2014 · By Brandon Philips

Jumpers and the Software Defined Localhost

January 13, 2014 · By Alex Polvi

etcd 0.2.0 - new API, new modules and tons of improvements

December 27, 2013 · By Brandon Philips

Running etcd in Docker Containers

December 13, 2013 · By Rob Szumski

CoreOS alpha updates

December 9, 2013 · By Alex Polvi

Running a Utility Cluster on CoreOS

December 4, 2013 · By Rob Szumski

CoreOS on Google Compute Engine

December 2, 2013 · By Alex Polvi

etcd v0.1.2 with a new dashboard and bugfixes

October 10, 2013 · By Brandon Philips

Boot on Bare Metal with PXE

September 11, 2013 · By Brandon Philips

OpenStack, VMware and KVM images available

August 28, 2013 · By Brandon Philips

etcd v0.1.0 release

August 11, 2013 · By Brandon Philips

CoreOS Vagrant Images

August 2, 2013 · By Alex Polvi

Distributed configuration data with etcd

July 23, 2013 · By Brandon Philips

Recoverable System Upgrades

July 16, 2013 · By Brandon Philips