Concepts

This page explains the core concepts behind AgentMetal's architecture.

Resources

Everything in AgentMetal is a resource. Every resource has a standard metadata envelope called ResourceMeta:

Field	Type	Description
`ID`	string	Unique identifier (UUID)
`Name`	string	Human-readable name
`Kind`	string	Resource type (Instance, Database, VPC, etc.)
`Version`	int	Monotonically increasing version for optimistic concurrency
`Labels`	map	Key-value pairs for organizing and filtering resources
`CreatedAt`	timestamp	When the resource was created
`UpdatedAt`	timestamp	When the resource was last modified

Each resource also has a Spec (desired state) and a Status (observed state). Agents work to make the actual state match the desired state.

Status Model

Every resource follows a standard lifecycle:

Pending → Creating → Running → Updating → Deleting → Deleted
                        ↓
                      Error

Pending — the resource has been accepted but no agent has started work
Creating — an agent is actively provisioning the resource
Running — the resource is healthy and operational
Updating — an agent is applying a configuration change
Deleting — an agent is tearing down the resource
Deleted — the resource has been fully removed
Error — something went wrong; the agent will attempt to heal automatically

Agents

AgentMetal has 13 specialized agents, one for each service domain plus shared infrastructure agents. Every agent implements four core methods:

Reconcile() — compare desired state with actual state and produce a plan to converge them
Heal() — detect and remediate failures (restart processes, failover replicas, replace nodes)
Plan() — generate a human-readable execution plan for review before applying changes
Tools() — return the set of domain-specific tools the agent can use

Reconciliation Loop

The reconciliation loop is the heart of AgentMetal:

Observe — read desired state from etcd and actual state from the infrastructure
Diff — compare the two and identify what needs to change
Plan — generate a sequence of operations to converge
Approve — check the risk level; auto-approve safe operations or request human approval
Execute — run the planned operations using domain-specific tools
Record — write the outcome to the audit log with full reasoning

Risk Levels

Every operation is classified by risk:

Level	Behavior	Example
Safe	Auto-executed immediately	Adding a DNS record, scaling up replicas
Moderate	Auto-executed with notification	Restarting a service, applying config changes
Dangerous	Requires human approval	Deleting data, major version upgrades, failover

## Tools

Agents use domain-specific tools to interact with infrastructure. Tools are strongly typed functions that agents call during execution:

SSH — execute commands on remote hosts
libvirt — manage VMs (create, destroy, migrate)
PostgreSQL — run SQL, manage replication, configure parameters
Redis — manage instances, configure sentinel, form clusters
nftables — manage firewall rules for security groups
WireGuard — configure VPN tunnels for VPC networking
HAProxy — manage load balancer configuration and hot-reload
MinIO — manage object storage buckets and policies

Providers

AgentMetal abstracts bare-metal providers behind a BareMetalProvider interface. Supported providers:

Hetzner — dedicated servers and cloud instances
OVH — dedicated servers across European data centers
Equinix Metal — global bare-metal infrastructure

The provider interface handles server provisioning, network configuration, and OS installation. Once a server is provisioned, AgentMetal agents take over all service management.

Operations

Long-running tasks are tracked as operations. When you create a resource, the API returns an operation ID that you can poll:

agentmetal operation get op-abc123

Or via the API:

GET /v1/operations/{id}

Operations include status, progress percentage, and a list of steps completed.