AI Agents

Overview of the 13 AI agents that manage infrastructure resources in AgentMetal.

AI Agents

Overview

AgentMetal uses 13 specialized AI agents to manage infrastructure resources. Each agent is responsible for a specific domain (instances, databases, DNS, etc.) and uses Large Language Models for reasoning combined with a tool-use system for executing actions.

Agent Architecture

The system follows an agent-per-domain pattern where each agent encapsulates deep knowledge about its resource type. An Orchestrator agent coordinates multi-service deployments that span multiple domains.

Agent Roster

AgentDomainDescription
OrchestratorMulti-serviceCoordinates complex deployments across agents
Instance AgentComputeProvisions and manages bare-metal/virtual instances
Database AgentDataManages PostgreSQL and MySQL databases
VPC AgentNetworkingManages VPCs, subnets, and security groups
LoadBalancer AgentTrafficManages HTTP and TCP load balancers
DNS AgentDNSManages zones and records
K3s AgentKubernetesManages K3s cluster lifecycle
Redis AgentCachingManages Redis clusters
Queue AgentMessagingManages RabbitMQ and NATS queues
Function AgentServerlessManages serverless functions
Bucket AgentStorageManages object storage buckets
IaC AgentDeclarativeProcesses Infrastructure as Code stacks
Healing AgentReliabilityMonitors and auto-heals infrastructure issues
## How Agents Work
  1. State Change Detection: etcd watchers notify agents when desired or actual state changes
  2. Reasoning: The agent uses an LLM to analyze the state delta and generate an execution plan
  3. Risk Classification: Each planned action is classified as Safe, Moderate, or Dangerous
  4. Execution: Tools are invoked sequentially to carry out the plan
  5. Audit: Every decision and action is recorded in the audit log

Key Capabilities

  • Reconciliation: Continuously converge actual state to desired state
  • Self-Healing: Automatically detect and resolve infrastructure issues
  • Approval Workflows: Dangerous operations require human approval
  • Tool Use: Each agent has domain-specific tools for interacting with providers
  • LLM Routing: Tasks are routed to appropriate model sizes based on complexity