Node lifecycle
Node lifecycle
Rakkr can optionally manage the host side of recorder nodes — installing dependencies, deploying the agent binary, managing the systemd service, rotating CA trust, and running smoke checks — over SSH, driven from the operator console. The controller records RBAC/audit context; Ansible owns the host work.
This subsystem is optional and currently a scaffold under active development. If you don’t need remote host management, you can run agents entirely by hand.
Architecture
flowchart LR ui["Nodes page\n(node:manage)"] --> api["Controller API\nlifecycle routes"] api -->|"POST /runs"| runner["Ansible runner\n(Docker, :8790)"] runner -->|"ansible-playbook over SSH"| host["Recorder host"]
The controller never SSHes anywhere itself. It calls the Ansible runner — a
small Dockerized HTTP service (deploy/ansible/runner.py) — which runs an Ansible
playbook against the target host.
Supported actions
Five allowlisted actions are supported end to end:
| Action | What it does |
|---|---|
install_dependencies | Install the recorder packages (alsa-utils, ffmpeg, PipeWire/JACK, etc.) and create the rakkr user/group and directories. |
update_binary | Download the recorder-agent from a GitHub release (verified by .sha256), install it, and (re)deploy the env file + systemd unit. |
restart_service | Restart the rakkr-recorder-agent systemd service. |
rotate_trust | Install/refresh the controller CA in the host trust store. |
smoke_check | Run a node smoke command (default --print-inventory) and report its output. |
The same allowlist is enforced at the controller route, the runner, and the Ansible role, so an unknown action is rejected at every layer.
Running an action from the console
On the Nodes page, the per-node lifecycle menu (visible with node:manage)
calls POST /api/v1/nodes/:nodeId/lifecycle/:action. The controller validates the
action, confirms the node is in your scope, runs it via the runner, and audits the
result with the runner run ID, exit code, target host, status, and output. Recent
runs are listed via GET /api/v1/nodes/:nodeId/lifecycle-jobs.
The Ansible runner
The runner exposes two endpoints:
GET /healthz— liveness.POST /runs— run an allowlisted action against a target. It builds a single-host inventory, runsansible-playbook, and returns{ exitCode, runId, stdout, stderr, targetHost }.
The role (deploy/ansible/roles/recorder_node) is idempotent (package/user/file/
systemd modules, no ad-hoc shell orchestration), branches on distro vars
(Debian/RedHat), escalates privilege via become, and rolls out serially
across hosts.
Binary deployment
update_binary deploys from a published GitHub release by default. Each target
node downloads the static musl artifact for its architecture
(x86_64-unknown-linux-musl / aarch64-unknown-linux-musl), verifies it against
the release .sha256, and installs it — so nodes need outbound access to GitHub.
| Variable | Purpose |
|---|---|
RAKKR_ANSIBLE_AGENT_SOURCE | release (default) pulls a GitHub release; local copies a staged file (offline/smoke). |
RAKKR_ANSIBLE_AGENT_REPO | owner/repo releases are pulled from (defaults to yashau/Rakkr). |
RAKKR_ANSIBLE_GITHUB_TOKEN | Optional token for private repos or higher GitHub API rate limits. |
Without a pinned version the role resolves the newest release; the console’s
Update Binary action does this. To deploy a specific build, forward
agentVersion (a full release tag such as agent-v2026.06.28-1) through the
lifecycle API. Releases are built by the Release recorder agent workflow — see
Releases & versioning and the
recorder-agent versioning notes.
Security model
Lifecycle credentials are kept out of node metadata. SSH users, keys, and become passwords live only in the runner’s environment:
| Variable | Purpose |
|---|---|
RAKKR_ANSIBLE_TARGETS | Per-node JSON: host, sshUser, sshKeyFile, sshPassword, becomePassword, smokeCommand. |
RAKKR_ANSIBLE_SSH_DIR | Host directory mounted read-only into the runner for key files. |
RAKKR_ANSIBLE_DEFAULT_SSH_USER | Default SSH user when a target doesn’t specify one. |
RAKKR_ANSIBLE_HOST_OVERRIDES | Map node IDs to hosts without changing node metadata. |
RAKKR_ANSIBLE_ROLLOUT_SERIAL | Serial rollout batch size. |
A mounted private key is copied into a per-run temp dir and chmod 0600 before
use; password auth is suppressed when a key is present. The controller only knows
RAKKR_ANSIBLE_RUNNER_URL (and an optional token/timeout) — never the SSH
secrets.
Smoke validation
Two mise tasks exercise the path (both call
scripts/ansible-lifecycle-smoke.mjs):
# Local: the compose runner defaults to AGENT_SOURCE=local and deploys the baked# test artifact into the Compose test rig (no network), then smokesdocker compose up -d --build ansible-runner recorder-test-rigmise run ansible:runner-smoke
# Physical X32 rig: safe smoke_check only (no binary deploy)$env:RAKKR_ANSIBLE_SSH_DIR = "$env:USERPROFILE\.ssh"$env:RAKKR_ANSIBLE_TARGETS = '{"node_x32_test":{"host":"172.22.145.152","sshUser":"root","sshKeyFile":"/run/rakkr-ssh/id_ed25519","smokeCommand":"/tmp/rakkr-recorder-agent --print-inventory"}}'docker compose up -d --build ansible-runnermise run ansible:x32-smoke
update_binarypulls the latest release from GitHub by default. For an air-gapped rig, setRAKKR_ANSIBLE_AGENT_SOURCE=localand pointRAKKR_ANSIBLE_BINARY_SRCat a staged Linux recorder-agent artifact.
Full environment details and the X32 example are in
Deployment and deploy/ansible/README.md.