Agent Reconciliation Loop

Overview

Reconciliation is the core loop that drives infrastructure management. Each agent continuously compares the desired state (what should exist) with the actual state (what does exist) and takes action to close any gaps.

The Five Phases

Phase 1: Observe

The agent reads both desired and actual state from etcd:

Desired state: Set by user actions (CLI, API, IaC) and stored under /desired//
Actual state: Reported by provider polling and stored under /actual//

Desired: Instance "web-01" with type=cx31, image=ubuntu-22.04
Actual:  Instance "web-01" does not exist
Delta:   Instance "web-01" needs to be created

Phase 2: Reason

The agent sends the state delta to the LLM along with its available tools. The LLM generates an execution plan:

Reasoning: Instance "web-01" exists in desired state but not in actual state.
           Need to create it using the provision_instance tool with the
           specified type and image.
Plan:
  1. provision_instance(name="web-01", type="cx31", image="ubuntu-22.04")
  2. wait_for_ready(instance="web-01", timeout=300)
  3. update_actual_state(instance="web-01", state="running")

Phase 3: Classify Risk

Each step in the plan is classified by risk level:

Risk Level	Criteria	Action
Safe	Read-only, non-destructive	Auto-execute
Moderate	Creates or modifies resources	Auto-execute + notify
Dangerous	Deletes or significantly alters resources	Require human approval

The overall plan risk is the maximum risk of any individual step.

Phase 4: Execute

If the plan is approved (automatically for Safe/Moderate, manually for Dangerous), tools are executed sequentially:

[1/3] provision_instance(name="web-01", type="cx31") ... success (inst-abc123)
[2/3] wait_for_ready(instance="inst-abc123", timeout=300) ... success (running)
[3/3] update_actual_state(instance="inst-abc123") ... success

If a step fails, the agent can retry, skip, or abort based on the error type and configured policy.

Phase 5: Record

Every decision and action is recorded in the audit log:

{
  "agent": "instance-agent",
  "action": "reconcile",
  "resource": "web-01",
  "reasoning": "Instance exists in desired state but not actual state",
  "plan_steps": 3,
  "risk_level": "moderate",
  "outcome": "success",
  "duration_ms": 45000,
  "timestamp": "2025-01-15T10:30:00Z"
}

Continuous Reconciliation

Agents do not run on a fixed schedule. Instead, they react to state changes via etcd watchers:

User creates an instance via CLI
CLI writes desired state to etcd
etcd watch fires, notifying the instance agent
Agent runs the reconciliation loop
Actual state is updated when provisioning completes

This event-driven approach minimizes latency and avoids unnecessary polling.