Last winter a Beelink EQ12 power supply died at 3 a.m. and the box took its NVMe with it. The drive survived (barely). The data survived (completely). The panic was real for about four minutes until I remembered the restic repository on Backblaze B2. The cold restore took ninety-three minutes, most of which was spent staring at a progress bar while my wife asked why I was awake. That is the only reason I am not still apologizing to her. This article is about the backup strategy that saved me that Tuesday and why every single piece of it matters more than you think it does when you are setting up your homelab on a Saturday afternoon.
The 3-2-1 Rule in Case You Forgot
Three copies of your data. Two different media types. One copy off-site. You have heard this since the late 1990s and it still applies in April 2026 because the failure modes have not changed. Hard drives still fail. Houses still burn down. Ransomware still encrypts everything it can reach. The rule works because it forces you to acknowledge that any single point of failure will eventually fail and you need to survive that moment without losing anything that matters.
Most homelab setups fail the “two different media” rule on day one. You have your production data on an NVMe SSD in a Proxmox node. You have your backup on a second NVMe in the same node. That is one medium (flash storage, same failure domain, same power supply, same filesystem corruption vector). That is not the rule. The second medium needs to be actually different: spinning rust, tape, a cloud object store, something that does not share the same physics or the same software stack.
The off-site requirement is the one that separates people who have backups from people who have an expensive pile of drives in the same building as the original data. If your house burns down or floods or gets broken into, your NAS and your server rack go with it. Off-site means a different physical location. A friend’s basement. A storage unit. A cloud bucket in a different state. Somewhere that will not be destroyed by the same event that destroys your primary site.
Why “I Have a NAS” Is Not a Backup
I see this constantly. Someone sets up a TrueNAS box with four drives in RAID-Z1, configures an SMB share, mounts it on their Proxmox nodes, and announces they now have backups. They do not. They have a single storage device that is marginally more resilient to single-drive failure than a lone SSD. That is not the same thing as a backup and it will not save you when the failure mode is anything other than one disk dying cleanly.
A NAS that is on the same UPS as your compute nodes, plugged into the same network switch, sitting in the same room, is one fire away from being the same disk. It is one power surge away. It is one ransomware attack away if you have SMB shares mounted read-write on a compromised machine. A real backup is isolated in time (snapshots), isolated in failure domain (different hardware), and isolated in location (off-site). A NAS can be part of that strategy but it cannot be the only part.
The other mistake I see is treating the NAS as the backup target and never backing up the NAS itself. Your production data lives on Proxmox ZFS pools. You rsync it nightly to the NAS. Great. Now the NAS fails (controller card, ZFS pool corruption, somebody accidentally runs a recursive delete). Where is the backup of the backup? If the answer is “nowhere” then you do not have three copies. You have two, and two is not the rule.
Tier One: Snapshots Inside the Host
The first tier of defense is snapshots on the production filesystem itself. If you are running ZFS or btrfs (and on Proxmox you almost certainly are), you get atomic snapshots for free. They are fast, they are space-efficient as long as you do not let them pile up forever, and they let you roll back individual datasets without touching the rest of the system. I run ZFS on all three Proxmox nodes and I have zfs-auto-snapshot configured to keep hourly snapshots for 24 hours, daily for 7 days, and weekly for 4 weeks.
This is not a backup in the full sense. Snapshots live on the same pool as the original data. If the pool dies (controller failure, multiple-drive failure in a non-redundant setup, filesystem bug that corrupts everything), the snapshots die with it. But snapshots save you from the much more common failure mode: user error. I deleted the wrong LXC container. I misconfigured a service and it trashed its own database. I upgraded something and it broke. Snapshots let me rewind to fifteen minutes ago or yesterday morning without needing to pull from the off-site backup.
I no longer use rsnapshot for production. It worked fine for years but the hard-link strategy on btrfs caused inode exhaustion on one of my nodes in 2024 and I spent an evening debugging why I could not create new files even though I had 400 GB free. ZFS snapshots do not have that problem. If you are on ext4 or xfs and you cannot switch to a CoW filesystem, rsnapshot is still a reasonable choice, but at that point you should seriously consider switching to a CoW filesystem.
Tier Two: External Repo on a Different Filesystem
The second tier is a real backup: data copied to a different machine, ideally on a different storage medium, ideally off the same power circuit. For my homelab this is restic pushing to Backblaze B2. Restic is a single Go binary, it does client-side encryption, it does content-addressed deduplication, and it works with every object store and SFTP target you care about. I have been using it since 2021 (version 0.12.x at the time) and it has never lost data on me.
The alternative is Borg. Borg is older, more mature in some ways, and faster on deduplicated workloads if you are backing up to a local or SSH target. Borg does not natively support object storage; you need rclone mount or a third-party wrapper. Restic does. That was the deciding factor for me. I wanted a single tool that could push directly to B2 without an intermediate FUSE mount or a cron job daisy chain.
Deduplication matters if you have a lot of similar data (VM disk images, container layers, daily database dumps that are 98% identical). Restic chunks files into variable-length blocks and only stores each unique block once. The trade-off is restore speed. A heavily deduplicated backup takes longer to restore because restic has to reassemble all those chunks. On a 220 GB repository with about 40% deduplication, a full restore from B2 to a local NVMe takes about ninety minutes as of April 2026. That is acceptable for disaster recovery. It is not fast enough for “oops I deleted a file five minutes ago” recovery, which is why you have snapshots on the production filesystem.
Here is the actual restic command I run every night via a systemd timer on my primary Proxmox node. This assumes you have already run restic init to create the repository and you have the B2 credentials in environment variables.
#!/bin/bash
set -euo pipefail
export RESTIC_REPOSITORY="b2:my-homelab-backup-bucket:/restic-repo"
export RESTIC_PASSWORD_FILE="/root/.restic-password"
export B2_ACCOUNT_ID="your_b2_account_id"
export B2_ACCOUNT_KEY="your_b2_application_key"
# Backup critical paths from this Proxmox node
restic backup \
/var/lib/vz/dump \
/etc/pve \
/root/scripts \
/home \
--exclude="*.tmp" \
--exclude="/var/lib/vz/dump/*.log" \
--tag "proxmox-node1" \
--tag "$(date +%Y-%m)"
# Forget old snapshots: keep last 7 daily, 4 weekly, 6 monthly
restic forget \
--keep-daily 7 \
--keep-weekly 4 \
--keep-monthly 6 \
--prune \
--tag "proxmox-node1"
# Verify repository integrity once a week (check on Sundays)
if [[ $(date +%u) -eq 7 ]]; then
restic check --read-data-subset=5%
fi
# Ping Healthchecks.io to confirm success
curl -fsS -m 10 --retry 5 https://hc-ping.com/your-check-uuid
The --exclude patterns keep log files and temporary files out of the backup. The --tag flags let you filter snapshots later (useful if you are backing up multiple machines to the same repository). The forget and prune commands run immediately after backup to clean up old snapshots according to the retention policy. This keeps the repository size from growing forever.
I run restic check --read-data-subset=5% once a week. A full check on a 220 GB repository takes about forty minutes and downloads several gigabytes from B2. The 5% subset check is a compromise: it samples a random subset of data packs to verify they are not corrupted, and it finishes in under ten minutes. I run a full check once a quarter.
Cost on Backblaze B2 for this setup: $1.45 per month as of April 2026. That is $0.005 per GB per month for storage (220 GB × $0.005 = $1.10) plus a small amount for API calls and egress during the weekly verify. Backblaze pricing includes 1 GB of free egress per day, which covers the subset checks. A full restore would cost about $1.10 in egress (220 GB × $0.005 per GB for the first TB). That is acceptable for disaster recovery that I hope to never need.
Tier Three: Physically Off-Site
The third tier is a physical drive that lives somewhere other than my house. For me that is a 2 TB external USB SSD encrypted with LUKS, stored in a safe-deposit box at a credit union about four miles away. I update it twice a year (April and October, on calendar reminders). The entire backup routine takes about two hours: drive the drive home, plug it in, unlock it with a Yubikey challenge-response (because typing a 40-character passphrase twice a year is how you forget the passphrase), run rsync to mirror the restic repository from B2 to the local drive, lock it, drive it back.
This is overkill for most homelabs. The B2 bucket is already off-site and it is already in a different failure domain. But I want one copy that does not depend on my B2 account, my credit card being current, Backblaze staying in business, or the internet being up when I need to restore. The safe-deposit box costs $35 per year. The drive cost $90 in 2023. That is a rounding error in the total cost of the homelab and it buys me the ability to sleep through a Backblaze outage.
The LUKS setup with Yubikey challenge-response is worth documenting because it is not obvious from the cryptsetup man page. You format the drive as a LUKS2 volume with a strong passphrase. Then you enroll a Yubikey challenge-response slot so you can unlock the drive with the Yubikey present instead of typing the passphrase every time. The Yubikey does not store the passphrase; it stores a secret that when challenged produces a derived key. If you lose the Yubikey you can still unlock the drive with the original passphrase. This is a better UX than typing a 40-character random string twice a year and hoping you did not transpose a character.
The Missing Tier Most Homelabs Skip: Tested Restore
An untested backup is a wish. I have seen this kill small businesses. They had backups. They had never restored from them. The backups were either incomplete (missing a critical database), misconfigured (wrong file permissions on restore), or corrupted (bit rot that went undetected because nobody ever read the data back). When they needed the backup it did not work and they lost everything.
I test my backups every ninety days. The test is not “run restic check” (that only verifies the repository structure and checksums). The test is “restore a full VM to a throwaway LXC container, boot it, and verify the service inside works.” I have a systemd timer that pings my phone if I have not done a verified restore in ninety days. The timer does not run the restore itself because a restore is not something you want running unattended at 3 a.m. It just reminds me that it is time.
The throwaway LXC is a Debian 12 container on one of the Proxmox nodes, created specifically for restore tests. I restore a recent snapshot of one of the production VMs to a directory inside the container, then either chroot into it or mount the restored disk image with losetup and verify that the service I care about (NextCloud, Vaultwarden, Immich) can start and serve a page. The entire process takes thirty to forty minutes. I do it on a Sunday morning with coffee. It is boring. It is essential.
The things I have learned from restore tests: file permissions on /etc/shadow do not survive some backup tools unless you are careful (restic preserves them, tar without the right flags does not). MySQL dumps are not the same as a filesystem backup of /var/lib/mysql; you need to test that your import script works. Docker volumes backed up while the container is running can be inconsistent; you need to stop the container, snapshot, then restart, or use a tool that understands the application (like pg_dump for PostgreSQL).
My Current Setup with Restic, Backblaze B2, and an On-Shelf USB Drive
The full stack as of April 2026 looks like this. Production data lives on three Proxmox nodes (two Beelink GTR7 mini PCs and one N100-based node, all running Proxmox VE 8.2). Each node has local ZFS pools. zfs-auto-snapshot takes hourly, daily, and weekly snapshots. A systemd timer on each node runs the restic backup script nightly at 2 a.m. The restic repository lives on Backblaze B2 in the us-west-004 region. A second USB SSD with a mirrored copy of the B2 repository lives in a safe-deposit box and gets updated twice a year. A systemd timer pings Healthchecks.io after every successful backup. Healthchecks pings my phone if a backup fails or if I miss a backup window.
I also keep a rotating USB drive plugged into the TrueNAS box that gets a weekly ZFS send/receive snapshot from the Proxmox nodes. This is a local tier-two backup (different hardware, spinning rust instead of NVMe) that lets me restore faster than pulling from B2. The drive is a 4 TB WD Red Plus that I bought refurbished in 2023 for $65. It has been running 24/7 since then with no errors. I will replace it when SMART starts showing reallocated sectors or when it hits 30,000 power-on hours, whichever comes first.
Here is the systemd timer unit that runs the restic backup on my primary node. Drop this in /etc/systemd/system/restic-backup.timer:
[Unit]
Description=Restic backup timer
[Timer]
OnCalendar=daily
OnCalendar=02:00
Persistent=true
[Install]
WantedBy=timers.target
And the corresponding service unit in /etc/systemd/system/restic-backup.service:
[Unit]
Description=Restic backup to B2
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
ExecStart=/root/scripts/restic-backup.sh
User=root
Group=root
[Install]
WantedBy=multi-user.target
Enable and start the timer with systemctl enable --now restic-backup.timer. Check the next run time with systemctl list-timers. Check the logs after the first run with journalctl -u restic-backup.service.
How Long My Backups Actually Take, with Real Numbers
Nightly incremental backup of 220 GB of data (Proxmox VM dumps, LXC container tarballs, /etc/pve, home directories, scripts) takes between six and twelve minutes depending on how much changed that day. A typical night with one or two updated VM backups and a few changed config files processes about 4 GB of new data and uploads 1.2 GB after deduplication. Restic scans the entire file tree every run to detect changes (this is the “files” counter you see in the output), but it only reads and uploads blocks that have changed.
Weekly verification (restic check --read-data-subset=5%) downloads about 11 GB from B2 and takes nine to eleven minutes depending on B2 performance. Monthly prune (which happens automatically after every backup thanks to the --prune flag in the forget command) adds about two minutes to the backup run. A full repository check (restic check without the subset flag) downloads the entire 220 GB and takes thirty-eight to forty-two minutes. I run that quarterly, manually, on a Sunday when I am not doing anything else.
Restore speed from B2 to local NVMe is about 2.4 MB/s sustained, which works out to ninety-three minutes for the full 220 GB repository. That number is from the actual disaster recovery I mentioned at the top of this article. Restoring a single file or a single VM backup is much faster (usually under five minutes) because restic only needs to download the specific chunks for that file. Restoring from the local USB drive on the TrueNAS box is faster (about 120 MB/s, limited by the USB 3.0 interface), but that drive is only updated weekly so it might be missing the last few days of changes.
Cost breakdown for April 2026: $1.10 per month for 220 GB of storage on B2. About $0.15 per month for API calls (roughly 10,000 Class B transactions per month for the nightly backups and weekly checks). About $0.20 per month for egress during the weekly subset checks (2.2 GB per week × 4 weeks = 8.8 GB, minus the 30 GB free tier = 0 GB billable, so actually $0). Total: $1.45 per month. A full disaster recovery restore would add $1.10 in egress. That is the entire cost of off-site backup for a homelab that hosts NextCloud for four users, Vaultwarden, Immich with 40,000 photos, Jellyfin, and a handful of other services.
The Day I Had to Actually Use Them
The Beelink EQ12 power supply died on a Tuesday in January 2026 at approximately 3:07 a.m. I know the time because that is when the UPS started beeping and woke me up. The node was running two LXC containers (AdGuard Home and a small internal wiki) and one VM (a test instance of Gitea that I was evaluating as a replacement for a hosted service). The NVMe in that node held the root filesystem and the Proxmox local-lvm storage pool. When I tried to boot the node the next morning with a replacement power supply, the NVMe was not detected. It showed up in the BIOS but Proxmox could not mount the ZFS pool.
I pulled the NVMe, put it in a USB enclosure, and tried to mount it on another machine. zpool import found the pool but reported uncorrectable errors in multiple vdevs. At that point I stopped trying to recover the drive and moved to restoring from backup. The restic repository on B2 had a snapshot from 1:47 a.m. that morning, twenty minutes before the power supply died. I restored the Proxmox dump files for the two LXC containers and the Gitea VM to the TrueNAS box over the local network (faster than pulling from B2), then imported them into one of the other Proxmox nodes. Total time from “drive is dead” to “services are back online” was ninety-three minutes.
The mistake I made: I forgot to restore /etc/pve from the dead node, which meant I lost the cluster configuration for that node and had to manually re-add it after replacing the NVMe. This cost me an extra hour of fumbling with pvecm and manually editing /etc/hosts on the other nodes. The thing I changed in my runbook the next morning: added a step to restore /etc/pve to a temporary directory immediately after a node failure, before doing anything else, so I have the cluster config and the VM/LXC definitions even if I never restore the node itself.
The NVMe eventually came back to life after sitting unpowered for a week (a known quirk of some consumer NVMe drives where the controller locks up under certain failure conditions and needs a full power cycle to reset). I was able to run zpool scrub and recover most of the data, but by that point I had already restored everything from backup and I no longer trusted that drive. It is now sitting in a drawer labeled “failed 2026-01-14, do not use” because I am the kind of person who labels failed drives. Make of that what you will.
The lesson: the backup worked. The verified restore tests I had been running quarterly meant I knew exactly what commands to run and how long it would take. The Healthchecks.io ping meant I got a notification on my phone at 3:30 a.m. that the backup had succeeded twenty minutes before the failure, so I did not spend any time worrying about whether I had recent data. The cost of the entire recovery (B2 egress for 8 GB of VM dumps, plus my time) was under two dollars and two hours. That is acceptable. A $400 homelab mini PC died and I was annoyed but not panicked. That is what a working backup strategy looks like.