Skip to content

Expose cgroup v2 memory.events as Prometheus metrics#3870

Merged
dims merged 1 commit into
google:masterfrom
sohankunkerkar:add-memory-events-upstream
May 13, 2026
Merged

Expose cgroup v2 memory.events as Prometheus metrics#3870
dims merged 1 commit into
google:masterfrom
sohankunkerkar:add-memory-events-upstream

Conversation

@sohankunkerkar

Copy link
Copy Markdown
Contributor

Kubernetes KEP-2570 (MemoryQoS) uses cgroup v2 memory.high for throttling and memory.min/memory.low for memory protection. To observe the effect of these settings, operators need visibility into memory pressure events. cadvisor currently does not read the memory.events cgroup file. The existing container_oom_events_total metric comes from kernel log parsing, not cgroup counters.

Read memory.events on cgroup v2 and expose two new Prometheus counter metrics:

  • container_memory_events_high_total: times the container was throttled for breaching memory.high
  • container_memory_events_max_total: times the container's usage hit memory.max

xref: kubernetes/enhancements#2570

@sohankunkerkar

Copy link
Copy Markdown
Contributor Author
Comment thread container/libcontainer/handler.go
@sohankunkerkar sohankunkerkar force-pushed the add-memory-events-upstream branch from 921423a to 1a98acf Compare May 7, 2026 17:29
@sohankunkerkar

Copy link
Copy Markdown
Contributor Author

@dims could you take a look at it?

@dims

dims commented May 12, 2026

Copy link
Copy Markdown
Collaborator

@sohankunkerkar no tests at all? :(

  • add a unit test in container/libcontainer/handler_test.go?
  • integration/tests/metrics/prometheus_test.go::TestCoreMemoryMetricsExist (add to memoryMetrics array)
  • integration/tests/api/event_test.go::TestOomKillEventConstraint (same setup as the OOM test, but poll /metrics until container_memory_events_max_total{name~containerID} > 0.)
@sohankunkerkar sohankunkerkar force-pushed the add-memory-events-upstream branch from 1a98acf to 9d965e7 Compare May 13, 2026 14:18
@sohankunkerkar

Copy link
Copy Markdown
Contributor Author

@sohankunkerkar no tests at all? :(

  • add a unit test in container/libcontainer/handler_test.go?
  • integration/tests/metrics/prometheus_test.go::TestCoreMemoryMetricsExist (add to memoryMetrics array)
  • integration/tests/api/event_test.go::TestOomKillEventConstraint (same setup as the OOM test, but poll /metrics until container_memory_events_max_total{name~containerID} > 0.)

@dims I addressed your comments. Could you take a look at it again? Thanks!

Kubernetes KEP-2570 (MemoryQoS) uses cgroup v2 memory.high for
throttling and memory.min/memory.low for memory protection. To observe
the effect of these settings, operators need visibility into memory
pressure events. cadvisor currently does not read the memory.events
cgroup file — the existing container_oom_events_total metric comes from
kernel log parsing, not cgroup counters.

Read memory.events on cgroup v2 and expose two new Prometheus counter
metrics:

- container_memory_events_high_total: times the container was throttled
  for breaching memory.high
- container_memory_events_max_total: times the container's usage hit
  memory.max

Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
@sohankunkerkar sohankunkerkar force-pushed the add-memory-events-upstream branch from 9d965e7 to cc66235 Compare May 13, 2026 19:20
@dims dims merged commit e3eecca into google:master May 13, 2026
10 checks passed
@dims

dims commented May 13, 2026

Copy link
Copy Markdown
Collaborator
@sohankunkerkar

Copy link
Copy Markdown
Contributor Author

@dims Thanks for reviewing this PR. I had one quick question: do we have any plans to cut a new release of cAdvisor anytime soon? We might need it for testing the MemoryQoS feature in Kubernetes.

@dims

dims commented May 13, 2026

Copy link
Copy Markdown
Collaborator

@sohankunkerkar i try to do at least one release of cadvisor to support k8s. Will take stock soon-ish. Do you need this in short order? (weeks? days?)

@sohankunkerkar

Copy link
Copy Markdown
Contributor Author

@sohankunkerkar i try to do at least one release of cadvisor to support k8s. Will take stock soon-ish. Do you need this in short order? (weeks? days?)

Thanks for the update! It would be ideal if we could get that once the v1.37 branch opens, or before the feature freeze maybe?

QiWang19 added a commit to QiWang19/kubernetes that referenced this pull request May 27, 2026
PR [cadvisor#3870](google/cadvisor#3870) exposes `memory.events` (high/max) through cadvisor's Prometheus metrics and REST API. This follow-up surface them through kubelet's Summary Stats API

Signed-off-by: Qi Wang <qiwan@redhat.com>
QiWang19 added a commit to QiWang19/kubernetes that referenced this pull request May 27, 2026
PR [cadvisor#3870](google/cadvisor#3870) exposes `memory.events` (high/max) through cadvisor's Prometheus metrics and REST API. This follow-up surface them through kubelet's Summary Stats API

Signed-off-by: Qi Wang <qiwan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants