Skip to content

Node lifecycle

Node lifecycle

Rakkr can optionally manage the host side of recorder nodes — installing dependencies, deploying the agent binary, managing the systemd service, rotating CA trust, and running smoke checks — over SSH, driven from the operator console. The controller records RBAC/audit context; Ansible owns the host work.

This subsystem is optional and currently a scaffold under active development. If you don’t need remote host management, you can run agents entirely by hand.

Architecture

flowchart LR
  ui["Nodes page\n(node:manage)"] --> api["Controller API\nlifecycle routes"]
  api -->|"POST /runs"| runner["Ansible runner\n(Docker, :8790)"]
  runner -->|"ansible-playbook over SSH"| host["Recorder host"]

The controller never SSHes anywhere itself. It calls the Ansible runner — a small Dockerized HTTP service (deploy/ansible/runner.py) — which runs an Ansible playbook against the target host.

Supported actions

Five allowlisted actions are supported end to end:

ActionWhat it does
install_dependenciesInstall the recorder packages (alsa-utils, ffmpeg, PipeWire/JACK, etc.) and create the rakkr user/group and directories.
update_binaryDownload the recorder-agent from a GitHub release (verified by .sha256), install it, and (re)deploy the env file + systemd unit.
restart_serviceRestart the rakkr-recorder-agent systemd service.
rotate_trustInstall/refresh the controller CA in the host trust store.
smoke_checkRun a node smoke command (default --print-inventory) and report its output.

The same allowlist is enforced at the controller route, the runner, and the Ansible role, so an unknown action is rejected at every layer.

Running an action from the console

On the Nodes page, the per-node lifecycle menu (visible with node:manage) calls POST /api/v1/nodes/:nodeId/lifecycle/:action. The controller validates the action, confirms the node is in your scope, runs it via the runner, and audits the result with the runner run ID, exit code, target host, status, and output. Recent runs are listed via GET /api/v1/nodes/:nodeId/lifecycle-jobs.

The Ansible runner

The runner exposes two endpoints:

  • GET /healthz — liveness.
  • POST /runs — run an allowlisted action against a target. It builds a single-host inventory, runs ansible-playbook, and returns { exitCode, runId, stdout, stderr, targetHost }.

The role (deploy/ansible/roles/recorder_node) is idempotent (package/user/file/ systemd modules, no ad-hoc shell orchestration), branches on distro vars (Debian/RedHat), escalates privilege via become, and rolls out serially across hosts.

Binary deployment

update_binary deploys from a published GitHub release by default. Each target node downloads the static musl artifact for its architecture (x86_64-unknown-linux-musl / aarch64-unknown-linux-musl), verifies it against the release .sha256, and installs it — so nodes need outbound access to GitHub.

VariablePurpose
RAKKR_ANSIBLE_AGENT_SOURCErelease (default) pulls a GitHub release; local copies a staged file (offline/smoke).
RAKKR_ANSIBLE_AGENT_REPOowner/repo releases are pulled from (defaults to yashau/Rakkr).
RAKKR_ANSIBLE_GITHUB_TOKENOptional token for private repos or higher GitHub API rate limits.

Without a pinned version the role resolves the newest release; the console’s Update Binary action does this. To deploy a specific build, forward agentVersion (a full release tag such as agent-v2026.06.28-1) through the lifecycle API. Releases are built by the Release recorder agent workflow — see Releases & versioning and the recorder-agent versioning notes.

Security model

Lifecycle credentials are kept out of node metadata. SSH users, keys, and become passwords live only in the runner’s environment:

VariablePurpose
RAKKR_ANSIBLE_TARGETSPer-node JSON: host, sshUser, sshKeyFile, sshPassword, becomePassword, smokeCommand.
RAKKR_ANSIBLE_SSH_DIRHost directory mounted read-only into the runner for key files.
RAKKR_ANSIBLE_DEFAULT_SSH_USERDefault SSH user when a target doesn’t specify one.
RAKKR_ANSIBLE_HOST_OVERRIDESMap node IDs to hosts without changing node metadata.
RAKKR_ANSIBLE_ROLLOUT_SERIALSerial rollout batch size.

A mounted private key is copied into a per-run temp dir and chmod 0600 before use; password auth is suppressed when a key is present. The controller only knows RAKKR_ANSIBLE_RUNNER_URL (and an optional token/timeout) — never the SSH secrets.

Smoke validation

Two mise tasks exercise the path (both call scripts/ansible-lifecycle-smoke.mjs):

Terminal window
# Local: the compose runner defaults to AGENT_SOURCE=local and deploys the baked
# test artifact into the Compose test rig (no network), then smokes
docker compose up -d --build ansible-runner recorder-test-rig
mise run ansible:runner-smoke
# Physical X32 rig: safe smoke_check only (no binary deploy)
$env:RAKKR_ANSIBLE_SSH_DIR = "$env:USERPROFILE\.ssh"
$env:RAKKR_ANSIBLE_TARGETS = '{"node_x32_test":{"host":"172.22.145.152","sshUser":"root","sshKeyFile":"/run/rakkr-ssh/id_ed25519","smokeCommand":"/tmp/rakkr-recorder-agent --print-inventory"}}'
docker compose up -d --build ansible-runner
mise run ansible:x32-smoke

update_binary pulls the latest release from GitHub by default. For an air-gapped rig, set RAKKR_ANSIBLE_AGENT_SOURCE=local and point RAKKR_ANSIBLE_BINARY_SRC at a staged Linux recorder-agent artifact.

Full environment details and the X32 example are in Deployment and deploy/ansible/README.md.