Database

Managed PostgreSQL and MySQL databases with streaming replication, automated failover, WAL archiving, and point-in-time recovery.

Database

The Database service provides managed PostgreSQL and MySQL instances with automated replication, failover, backup, and performance tuning. The Database Agent handles every aspect of the database lifecycle.

Database Types

TypevCPUsMemoryStorageDescription
db.small12 GB20 GBDevelopment and testing
db.medium24 GB50 GBSmall production workloads
db.large48 GB100 GBGeneral production use
db.xlarge816 GB200 GBHigh-throughput applications
## Features
  • Streaming replication — asynchronous or synchronous replication to one or more replicas
  • Automated failover — the agent detects primary failure and promotes a replica within seconds
  • WAL archiving — continuous archiving of write-ahead logs to MinIO for durability
  • Point-in-time recovery — restore to any second within the retention window
  • Automated tuning — the agent configures postgresql.conf and pg_hba.conf based on instance size and workload

Create a Database

Via CLI

agentmetal database create \
  --name mydb \
  --engine postgresql \
  --type db.medium

With a Replica

agentmetal database create \
  --name mydb \
  --engine postgresql \
  --type db.medium \
  --replicas 1

What the Agent Manages

The Database Agent performs these operations:

  1. Installation — installs PostgreSQL or MySQL on a provisioned VM, applies OS-level tuning (sysctl, ulimits)
  2. Configuration — generates optimized postgresql.conf based on available memory and CPU: shared_buffers, effective_cache_size, work_mem, wal_buffers
  3. Authentication — configures pg_hba.conf to allow connections from the VPC CIDR and generates credentials
  4. Replication setup — creates replication slots, configures streaming replication to replicas, and verifies WAL shipping
  5. Backup — schedules base backups to MinIO and continuous WAL archiving using pg_basebackup and archive_command
  6. Monitoring — tracks replication lag, connection count, query performance, and disk usage

Automated Failover

When the agent detects a primary failure:

  1. Verifies the primary is truly unreachable (multiple checks from different vantage points)
  2. Selects the replica with the least replication lag
  3. Promotes the replica to primary via pg_ctl promote
  4. Updates DNS records and connection strings to point to the new primary
  5. Reconfigures remaining replicas to follow the new primary
  6. Records the entire failover decision in the audit log with reasoning

Connecting to Your Database

After creation, retrieve the connection string:

agentmetal database get mydb --connection-string

Output:

postgresql://mydb_user:**@mydb.internal:5432/mydb?sslmode=require

Backups and Recovery

List available backups:

agentmetal database backups mydb

Restore to a point in time:

agentmetal database restore mydb --target-time "2024-01-15T10:30:00Z"