This is about the third story I hear about a Fedi instance losing all their data because of a CI/CD mistake.
Hugops, but also the usual grizzled old sysadmin advice:
A backup is a cost center. It has no value, it has only cost. Only a restore has a proven value, and comes with knowledge:
- You know you actually can restore, the backup was complete and does connect.
- You know how long the restore took, so you know the time to restore when asked. Not an estimate. The actual time.
- You know the restore procedure.
Restore every backup all the time, then throw the recovered instance away. Keep the metrics, keep the backup.
Parts of your setup may be stateless deployments with immutable images. That is, because you collected all system state and put it into one or two selected locations. You can redeploy everything but these selected locations.
If you drop them, if you make a config mistake, these things are gone gone. They cannot be redeployed unless you have taken measures to do so. See above, item 1.
That is why the storage people and the database people all look down on you hipster devops people and make condescending remarks. 🙂 Yah, ok, they are nicer than you probably think they are, but they do have a completely different outlook on operations.
Listen and learn. Also, restore test.
There are people who have taken steps to prevent their CI/CD from messing with EBS volumes, S3 buckets or K8s Persistent Volumes, and there are people who will lose data in the future.
Don’t be in the second group.
“Nobody wants backup. Everybody wants restore.” – Martin Seeger
See also Gitlab Data Loss .