Proxmox VE Best Practices: Stability & Security Checklist

In practice, systems tend to break not because of "insufficient features" but because of process and habits. This checklist can be used as daily guidelines — every pitfall avoided is another time you leave work on time.

Recommended Practices

1) Backup Must Be Paired with Restore Drills

Having a backup file doesn't mean you can actually restore from it. Periodically test a restore — verify it boots and services are healthy. A backup you've never tested is like a new pair of shoes you've never tried on: you only find out they hurt when you're in battle.

2) Snapshot Before Major Changes

Before updating packages, changing the network, or touching hardware, take a snapshot first. If things go wrong, roll back and skip the reinstall spiral.

qm snapshot 100 before-maintenance

3) Plan Your Update and Subscription Strategy

For production environments: know which environment gets updated first and which gets it later. Understand your subscription and package sources — don't randomly add third-party repos.

# Check subscription status
pvesubscription get

4) Network Segmentation and Minimal Exposure

Management, storage, and business traffic: separate when you can to avoid mutual interference and lateral spread. Save the "one thing breaks everything" scenario for the movies.

5) Role-Based Access and Firewall Baseline

Don't have everyone sharing a single root account. Create role-based accounts, enable 2FA where possible, and open only the necessary ports in the firewall. Weak password + high privilege = walking naked on the internet.

Common Pitfalls

Pitfall 1: Layering Hardware RAID on Top of ZFS/Ceph

ZFS and Ceph need to manage disks directly. Adding a hardware RAID layer on top halves their superpowers — both data protection and observability suffer.

Pitfall 2: Expecting HA from a Single Node

HA requires multiple nodes that can take over from each other. With only one machine, don't dream of high availability — that's a "single point of failure waiting to happen."

Pitfall 3: Deleting the Base Image of a Linked Clone

Linked clones depend on their parent disk. Delete the parent and all the linked VMs underneath break — like pulling out the bottom block in a stack.

Pitfall 4: No Capacity Monitoring

When local-lvm or backup storage fills up, writes fail and jobs break. Monitor early, clean up early — don't wait for an alert at 2 AM before getting up to fight the fire.

Pitfall 5: Weak Passwords and Excessive Permissions

If the management interface is exposed to the internet, weak passwords and wide-open permissions are an open invitation. Change what needs changing, restrict what needs restricting.

Following this checklist won't guarantee zero failures, but it'll save you a lot of unnecessary detours. When in doubt, check the official docs and forums — most pitfalls have been stumbled into by others before you. Best of luck with your operations — may your VMs run smoothly and your backups always restore! 🦦