Agent Reconciliation Loop

How agents observe state, reason about changes, and execute reconciliation plans.

Agent Reconciliation Loop

Overview

Reconciliation is the core loop that drives infrastructure management. Each agent continuously compares the desired state (what should exist) with the actual state (what does exist) and takes action to close any gaps.

The Five Phases

Phase 1: Observe

The agent reads both desired and actual state from etcd:

  • Desired state: Set by user actions (CLI, API, IaC) and stored under /desired//
  • Actual state: Reported by provider polling and stored under /actual//
Desired: Instance "web-01" with type=cx31, image=ubuntu-22.04
Actual:  Instance "web-01" does not exist
Delta:   Instance "web-01" needs to be created

Phase 2: Reason

The agent sends the state delta to the LLM along with its available tools. The LLM generates an execution plan:

Reasoning: Instance "web-01" exists in desired state but not in actual state.
           Need to create it using the provision_instance tool with the
           specified type and image.

Plan: 1. provision_instance(name="web-01", type="cx31", image="ubuntu-22.04") 2. wait_for_ready(instance="web-01", timeout=300) 3. update_actual_state(instance="web-01", state="running")

Phase 3: Classify Risk

Each step in the plan is classified by risk level:

Risk LevelCriteriaAction
SafeRead-only, non-destructiveAuto-execute
ModerateCreates or modifies resourcesAuto-execute + notify
DangerousDeletes or significantly alters resourcesRequire human approval
The overall plan risk is the maximum risk of any individual step.

Phase 4: Execute

If the plan is approved (automatically for Safe/Moderate, manually for Dangerous), tools are executed sequentially:

[1/3] provision_instance(name="web-01", type="cx31") ... success (inst-abc123)
[2/3] wait_for_ready(instance="inst-abc123", timeout=300) ... success (running)
[3/3] update_actual_state(instance="inst-abc123") ... success

If a step fails, the agent can retry, skip, or abort based on the error type and configured policy.

Phase 5: Record

Every decision and action is recorded in the audit log:

{
  "agent": "instance-agent",
  "action": "reconcile",
  "resource": "web-01",
  "reasoning": "Instance exists in desired state but not actual state",
  "plan_steps": 3,
  "risk_level": "moderate",
  "outcome": "success",
  "duration_ms": 45000,
  "timestamp": "2025-01-15T10:30:00Z"
}

Continuous Reconciliation

Agents do not run on a fixed schedule. Instead, they react to state changes via etcd watchers:

  1. User creates an instance via CLI
  2. CLI writes desired state to etcd
  3. etcd watch fires, notifying the instance agent
  4. Agent runs the reconciliation loop
  5. Actual state is updated when provisioning completes

This event-driven approach minimizes latency and avoids unnecessary polling.