Proxmox VE Advanced: Cluster, HA, Ceph & API
When your environment grows from one machine to multiple machines, operations shift from "as long as it runs" to "availability, scalability, and automation." This article takes you from soloing a dungeon to raiding as a group.
Building a Cluster
Use case: for centralized management, cross-node scheduling, and migration — start with a cluster.
# Create a cluster on the first node
pvecm create my-cluster
# Join other nodes (replace with your master IP)
pvecm add <cluster-master-ip>
# Check status
pvecm statusOnce the cluster is set up, avoid renaming node hostnames; make sure DNS, time sync, and network are all working before joining — otherwise you'll get strange issues.
HA (High Availability) Configuration
Use case: if a node goes down, VMs can automatically start on another node without service interruption.
Prerequisites:
- An existing cluster
- VM disks on shared storage
- Proper fencing mechanism to avoid split-brain (two nodes both thinking they're the leader)
# Put VM 100 under HA management
ha-manager add vm:100 --maxrestart 3 --maxrelocate 5When HA is configured correctly, users are almost unaware of single-node failures — like a cat always having a backup sleeping spot.
Ceph Hyper-Converged Storage
Use case: multiple nodes need shared storage with high availability — Ceph can handle both.
Recommended prerequisites:
- At least 3 nodes (Ceph doesn't like single points)
- Multiple dedicated disks per node
- A dedicated storage network is even better
# Initialize Ceph
pveceph init --network 10.0.0.0/24
# Create an OSD for each data disk
pveceph osd create /dev/sdX
# Create a Pool and attach it to PVE
pveceph pool create vm-data --add_storagesCeph works best with raw disks directly — don't layer hardware RAID on top. Doing so weakens both the data protection and observability that Ceph provides.
Storage Replication
Use case: cross-node redundancy for fast failover.
pvesr create-local-job 100 local-zfs remote-zfs --schedule "0 2 * * *"API and Automation Integration
Use case: operate via scripts, CI/CD, or integrate with other platforms. Use API tokens rather than using the root password everywhere.
# List VMs using an API token
curl -k \
-H "Authorization: PVEAPIToken=user@realm!tokenid=secret" \
"https://<PVE-IP>:8006/api2/json/nodes/<node>/qemu"Once automation is set up, repetitive work goes to scripts — you handle the coffee and the cat.
Next Steps
With advanced capabilities mastered, use a set of operations guidelines to reduce risk: 👉 Best Practices