Skip to content

Fix stale heartbeat process PIDs cleanup issue#902

Merged
k8s-ci-robot merged 3 commits into
kubernetes-sigs:mainfrom
aditya-shantanu:fix-pid-in-tools
Jun 2, 2026
Merged

Fix stale heartbeat process PIDs cleanup issue#902
k8s-ci-robot merged 3 commits into
kubernetes-sigs:mainfrom
aditya-shantanu:fix-pid-in-tools

Conversation

@aditya-shantanu

Copy link
Copy Markdown
Collaborator

What this PR does / why we need it:

Fixes a logic vulnerability in resourcectl cleanup where stale heartbeat process PIDs remained permanently trapped in the local state file (~/.local/resourcectl/state.json) if the heartbeat process exited prematurely. When the OS eventually recycled that PID and assigned it to a new innocent process, the subsequent execution of resourcectl cleanup would violently terminate that unrelated process via SIGKILL.

To fix this, we:

  • Safely verify the heartbeat process identity on Linux by reading and parsing /proc/<pid>/cmdline to ensure it contains the "heartbeat" and "--name" <resource> arguments before attempting to kill it.
  • Guarantee that stale PIDs are zeroed in state.json even if the kill operation fails.
  • Prune fully released resources completely from the state file rather than leaving empty structs, preventing unbounded file growth.
  • Ignore ESRCH errors if a heartbeat process is already dead.
  • Added comprehensive unit tests in main_test.go to reproduce the conditions and verify safety and cleanup behavior.

Which issue(s) this PR is related to:

Fixes b/511322601

Release Note

Fixed a logic issue in `resourcectl` where stale heartbeat PIDs could lead to arbitrary process termination due to PID recycling.
@netlify

netlify Bot commented Jun 1, 2026

Copy link
Copy Markdown

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit 13ca5cf
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/6a1f18112f29420008f19375
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 1, 2026
@janetkuo janetkuo requested a review from Copilot June 1, 2026 16:00

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a PID-recycling hazard in resourcectl cleanup where stale heartbeat PIDs persisted in ~/.local/resourcectl/state.json could cause SIGKILL to be delivered to an unrelated, recycled process. The fix verifies the target process's identity via /proc/<pid>/cmdline on Linux before killing, always zeroes stale PIDs, prunes fully-released resources from the state file, and tolerates ESRCH. Unit tests are added covering matching, innocent (non-heartbeat), and dead-PID scenarios.

Changes:

  • Verify /proc/<pid>/cmdline contains heartbeat and --name <resource> before killing (Linux only); ignore ESRCH.
  • Unconditionally clear HeartbeatPID after the kill attempt and remove fully-released resources from state instead of leaving empty entries.
  • Add main_test.go with a helper-process pattern and a runCleanup integration test using a mock Boskos server.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
dev/tools/resourcectl/main.go Adds isHeartbeatProcess PID verification, ESRCH tolerance, and prunes released resources from state.
dev/tools/resourcectl/main_test.go New tests covering isHeartbeatProcess and the full runCleanup flow with matching, innocent, and dead PIDs.
Comment thread dev/tools/resourcectl/main_test.go
Comment thread dev/tools/resourcectl/main.go Outdated
Comment thread dev/tools/resourcectl/main.go
Comment thread dev/tools/resourcectl/main_test.go Outdated
@barney-s barney-s self-assigned this Jun 2, 2026
…ary process termination

#### What this PR does / why we need it:
Fixes a logic vulnerability in `resourcectl cleanup` where stale heartbeat process PIDs remained permanently trapped in the local state file (`~/.local/resourcectl/state.json`) if the heartbeat process exited prematurely. When the OS eventually recycled that PID and assigned it to a new innocent process, the subsequent execution of `resourcectl cleanup` would violently terminate that unrelated process via `SIGKILL`.

To fix this, we:
- Safely verify the heartbeat process identity on Linux by reading and parsing `/proc/<pid>/cmdline` to ensure it contains the `"heartbeat"` and `"--name" <resource>` arguments before attempting to kill it.
- Guarantee that stale PIDs are zeroed in `state.json` even if the kill operation fails.
- Prune fully released resources completely from the state file rather than leaving empty structs, preventing unbounded file growth.
- Ignore `ESRCH` errors if a heartbeat process is already dead.
- Added comprehensive unit tests in `main_test.go` to reproduce the conditions and verify safety and cleanup behavior.

#### Which issue(s) this PR is related to:
Fixes b/511322601

#### Release Note
```release-note
Fixed a logic issue in `resourcectl` where stale heartbeat PIDs could lead to arbitrary process termination due to PID recycling.
```
@barney-s

barney-s commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aditya-shantanu, barney-s

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2026
@k8s-ci-robot k8s-ci-robot merged commit e442ec1 into kubernetes-sigs:main Jun 2, 2026
11 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in Agent Sandbox Jun 2, 2026
@aditya-shantanu aditya-shantanu deleted the fix-pid-in-tools branch June 3, 2026 18:56
carlossg pushed a commit to carlossg/agent-sandbox that referenced this pull request Jun 5, 2026
* dev/tools/resourcectl: fix persistent PID landmine and prevent arbitrary process termination

#### What this PR does / why we need it:
Fixes a logic vulnerability in `resourcectl cleanup` where stale heartbeat process PIDs remained permanently trapped in the local state file (`~/.local/resourcectl/state.json`) if the heartbeat process exited prematurely. When the OS eventually recycled that PID and assigned it to a new innocent process, the subsequent execution of `resourcectl cleanup` would violently terminate that unrelated process via `SIGKILL`.

To fix this, we:
- Safely verify the heartbeat process identity on Linux by reading and parsing `/proc/<pid>/cmdline` to ensure it contains the `"heartbeat"` and `"--name" <resource>` arguments before attempting to kill it.
- Guarantee that stale PIDs are zeroed in `state.json` even if the kill operation fails.
- Prune fully released resources completely from the state file rather than leaving empty structs, preventing unbounded file growth.
- Ignore `ESRCH` errors if a heartbeat process is already dead.
- Added comprehensive unit tests in `main_test.go` to reproduce the conditions and verify safety and cleanup behavior.

#### Which issue(s) this PR is related to:
Fixes b/511322601

#### Release Note
```release-note
Fixed a logic issue in `resourcectl` where stale heartbeat PIDs could lead to arbitrary process termination due to PID recycling.
```

* address feedback

* pr feedback
khirotaka pushed a commit to khirotaka/agent-sandbox that referenced this pull request Jun 12, 2026
* dev/tools/resourcectl: fix persistent PID landmine and prevent arbitrary process termination

#### What this PR does / why we need it:
Fixes a logic vulnerability in `resourcectl cleanup` where stale heartbeat process PIDs remained permanently trapped in the local state file (`~/.local/resourcectl/state.json`) if the heartbeat process exited prematurely. When the OS eventually recycled that PID and assigned it to a new innocent process, the subsequent execution of `resourcectl cleanup` would violently terminate that unrelated process via `SIGKILL`.

To fix this, we:
- Safely verify the heartbeat process identity on Linux by reading and parsing `/proc/<pid>/cmdline` to ensure it contains the `"heartbeat"` and `"--name" <resource>` arguments before attempting to kill it.
- Guarantee that stale PIDs are zeroed in `state.json` even if the kill operation fails.
- Prune fully released resources completely from the state file rather than leaving empty structs, preventing unbounded file growth.
- Ignore `ESRCH` errors if a heartbeat process is already dead.
- Added comprehensive unit tests in `main_test.go` to reproduce the conditions and verify safety and cleanup behavior.

#### Which issue(s) this PR is related to:
Fixes b/511322601

#### Release Note
```release-note
Fixed a logic issue in `resourcectl` where stale heartbeat PIDs could lead to arbitrary process termination due to PID recycling.
```

* address feedback

* pr feedback
alexatakvelon pushed a commit to volatilemolotov/agent-sandbox that referenced this pull request Jun 24, 2026
* dev/tools/resourcectl: fix persistent PID landmine and prevent arbitrary process termination

#### What this PR does / why we need it:
Fixes a logic vulnerability in `resourcectl cleanup` where stale heartbeat process PIDs remained permanently trapped in the local state file (`~/.local/resourcectl/state.json`) if the heartbeat process exited prematurely. When the OS eventually recycled that PID and assigned it to a new innocent process, the subsequent execution of `resourcectl cleanup` would violently terminate that unrelated process via `SIGKILL`.

To fix this, we:
- Safely verify the heartbeat process identity on Linux by reading and parsing `/proc/<pid>/cmdline` to ensure it contains the `"heartbeat"` and `"--name" <resource>` arguments before attempting to kill it.
- Guarantee that stale PIDs are zeroed in `state.json` even if the kill operation fails.
- Prune fully released resources completely from the state file rather than leaving empty structs, preventing unbounded file growth.
- Ignore `ESRCH` errors if a heartbeat process is already dead.
- Added comprehensive unit tests in `main_test.go` to reproduce the conditions and verify safety and cleanup behavior.

#### Which issue(s) this PR is related to:
Fixes b/511322601

#### Release Note
```release-note
Fixed a logic issue in `resourcectl` where stale heartbeat PIDs could lead to arbitrary process termination due to PID recycling.
```

* address feedback

* pr feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ready-for-review size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

5 participants