Backups scare me. They always have. They should scare you too.
Backups are important day-to-day, but when you need them they are critical. Bad things will happen to the files that are important to us and potentially critical to our business.
For the duration of my career in IT, I avoided being responsible for running backups – not only are they important, but they’re not particularly interesting in operation.
Now with my change of career, I’m responsible for my own backups. That’s no fun since it’s my business data.
Why we need backups
Most people will worry about disk failure (that’s why some people make a second copy), but while catastrophic hardware failure is a possibility (and not uncommon), operator error (or even software faults) are probably a much more significant risk. The nice thing about catastrophic hardware failure is that is easy to spot and handle (replace disk, recover data) … and restoring from a simple copy is adequate.
The other cases where you need to rely on your backup are much harder. These are the cases where you can’t immediately spot the failure.
While both Mac OSX and Windows both have a safety net of a trashcan between our deleted files and oblivion, that only helps when you spot your mistake in time (and there are times when we’re certain that we are safe to empty the trash … oops!).
We also have to worry about file corruption. Historically, error checking on file operations has never been programmers’ strength – there are a lot of corner case error conditions and handling them is hard. Catching it immediately is the best you can hope for – finding that your files have been corrupted at some unknown point is just a nightmare.
Copies aren’t backups
When I hear photographers discuss backups, almost all talk about making copies to an external drive. Maybe two external drives. Unfortunately, while a backup scheme is (at its conceptually simplest) based on making copies, in actuality it is really so much more.
Backups need to be reliable. They need to be simple in operation too – if it’s complicated then the human factor enters the equation. Basically it just needs to happen automatically, eliminating the chance of operator failure.
So, if having a second copy doesn’t make a backup, what does? A standard backup scheme is normally based around making a full copy and then subsequent backups record the changes. After a number of these incremental backups, another full copy is made and the subsequent incremental backups are then made relative to this latest full backup. This sequence is repeated.
Now of course neither media supplies nor budget are unlimited, so this backup scheme works with a pool of media which is only recycled after a set period. Because, as we previously mentioned, there is a risk that the data loss goes unnoticed, selected full copies are removed from the pool at regular intervals and retained for a set period (months or years).
Making two (or more) disk copies still doesn’t make a real backup system either. If you don’t spot the damage then you’re just as likely to replicate it to both copies. You also have to be a lot more disciplined than I to make the copies as frequently as you should.
Big disasters happen
Burglaries happen, pipes burst, buildings do go on fire, etc. – keeping your backup copy permanently attached (to make it easy to update) isn’t good either. Losing everything, including your backups, and having to turn around to clients to explain that the images from their big event are lost isn’t acceptable.
Replicating cloud storage isn’t a really a backup either
All the popular cloud storage services (and there are a lot of them: Dropbox, OneDrive, iCloud, Box, Copy, Google Drive, Amazon Drive, etc.) replicate your local files to the online storage (and potentially your other devices).
That’s useful in the event of catastrophic disk failure because you’ve got an online copy (a very handy safety net). However there is no safety net if you simply trash a file (either by overwriting or deleting it), because the change is pretty quickly replicated to the cloud too.
Dropbox does however provide some historical file versioning which has saved me when I’ve inadvertently overwritten a file. I’ve not researched the capabilities of the other services.
As I’ll repeatedly stress: copies don’t make a backup.
The old way – local backups
Historically, disk storage was backed up to removable storage – typically tape, but occasionally optical media. Unfortunately removable media capacities haven’t matched the growth in disk storage and a removable media backup generally isn’t feasible now.
A more modern approach is to use more disk storage for backups (but not just as a straight copy – a proper “near line” solution will offer versioning and history). This backup storage needs to be reliable and robust, so while it can use slower, less expensive disks it still needs to implement RAID or similar resiliency.
Internet and cloud backup services
Now with high speed internet access, cloud storage is widely used, but that’s not a backup solution.
A number of players have entered the online backup solution market – CrashPlan, Mozy, Carbonite, Backblaze, etc.. These generally install a local backup client/agent which operates automatically – changes in local storage are backed-up to the online service (distinct from cloud services which copy/replicate the local storage).
The primary consideration when looking at online backup solutions is your initial storage volume, your connection bandwidth and your typical storage growth – basically it’s a great solution, but isn’t feasible for so many of us yet. I’ll follow up on that topic in another post.
If copies are bad, what should I do?
Use a proper backup solution – ideally, you really need to use an online service. If you can’t then there isn’t really an easy solution.
Get some extra disks. Use something like CrashPlan as a local backup solution – it can use local storage or other computers and doesn’t just make copies.
Make sure that a copy of the data is held off-site – refresh & swap that copy as often is practical. Better still have two off-site copies – hold two different generations of the copy (i.e. current and previous).