February 24, 2010 Archives

Wed Feb 24 20:31:15 CET 2010

On Backups

Since today is Backup Awareness Day, I thought I'd write about how I handle backups. In contrast to most articles, I won't talk about the gritty details of making backups but more about what and where I do backups. Note that this does not include my workstation at work since that is matter of some complication (and not at all interesting).

I have four machines that are of importance to me: my workstation at home, my file server and router, my laptop and finally my co-located server. On each of them, there are different things I backup in different ways.

Workstation

My workstation has two main clumps of data that need backup: my home directory (actually, only parts of it) and the system configuration. It's also a nice example of what kinds of data need backup.

Home directory

My home directory I backup by using rsnapshot and keeping the snapshots on a different disk that is only ever mounted when backups are done - that way, even an accidental rm -rf / won't kill it. These backups I run daily at least (sometimes more often) and I keep about a week of dailies and two months of weeklies. Since rsnapshot makes highly efficient incremental backups, this doesn't actually use up much in the way of disk space. Another very important thing to do with home directories is to prune all the stuff that isn't important. Here are a few examples:

  • .mozilla/firefox/*/Cache/
  • tmp/
  • downloads/
  • .googleearth/Cache/

A few more directories that are specific to my setup are: src (where I keep source code of downloaded programs, but not my own stuff or anything I've patched), data/bg (my wallpaper collection; it's synced between my laptop and my workstation, so there already is a backup of sorts), debug/ (a directory where I keep unpacked sources with debug info on which I work).

I think this illustrates what kind of data you can dismiss. One important thing to keep in mind is that the size of files has very little relation to their importance: I don't mind losing the most recent git checkout of Linus' tree, but losing all the small dotfiles in my home directory would be very, very nasty.

System configuration

The first thing that pops into mind here is, naturally, /etc, where all of the system configuration files live (theoretically). Due to the way I work with these files (and their importance), I keep them in a version control system, the main repository of which I keep on my co-located server. Note that some VCSs don't handle file ownership and access rights very well, but there are solutions for that. Since there are files in /etc I never touch (yet might get updates for by way of distribution upgrades), I don't keep all of them in the repository but only those I've touched.

As I said earlier, most system configuration lives in /etc - but not all. For one, some software lives below /opt, including their configuration files. Another place where configuration files might end up is below /usr/local. In my case, /opt contains nothing of value, so I don't make backups of it. I do have a few scripts I've written that are in /usr/local/{bin,sbin}. I keep a copy of all those in my home directory (in mysrc/) so they're take care of. Note that my script for making backups of /home includes commands that copy over all the stuff from /usr/local so I don't have to do it by hand.

Finally, some system state lives in /var. In my case, this is the place where my distribution keeps track of installed packages. If I have to set up my machine from scratch, that list is handy, so I can just use it as a reference of things to install. Depending on the distribution used, other directories might warrant a backup.

File server

My file server has (again) two things to make backups of: the file storage (under /store) and the system setup. Since the latter works exactly the same way as on my workstation, I'll skip it here.

The main problem with backups of file servers is their simply size. In my case, /store is 1 terabyte and I don't have another storage location with the same size. I might simply add a 1T disk to the system and be done - but back when I built the machine, that simply was out of the question financially, so what to do?

The data on my file server can be easily classified by "needed often" and "needed seldom". Once a month, a cron job reminds me to move the latter off the machine (usually to external disks and/or DVDs). A handy way to handle "did I use this?" might be the atime record of the file system. For me, that hasn't worked well in the past, so I just jog my memory - and it's a file server for two users, so it's not a problem. Pruning old stuff not only results in having copies, it also makes the space last longer, since I more often get rid of old stuff that's just eating disk space. A note on backing up to optical media: I back up every file to two different media and I always include the checksums of all files, so I can be sure upon restore that the file is ok. Things that have only live on optical media for some time (typically, more than a year) are copied to new media once every 2-3 years if they're still important.

In the future I might move to using separate non-RAID disks for backups, but the whole file server setup is in flux, so I don't have anything solid, yet.

Laptop

I use my laptop seldom and I don't have a large working set of data on it. Usually, it's a glorified terminal to access my workstation at home, at work or my co-located server. As a result, backing up my home directory as often as I do on my workstation just makes no sense. Also, my laptop may see weeks of downtime on end, so anything daily (or even weekly) does not make much sense. Hence, I have used anacron to remind me every once in a while to back up the important things.

As for system configuration, the same things mentioned above apply. This works well since I won't change /etc if I don't use the laptop.

Co-located server

My co-located server has the same set of things to back up as most of my other machines: home directories, system configuration and a little bit of extra storage. Actually, strike the last one. I deliberately make no effort to back that up (and my users know it). As for the system configuration, again, the important stuff is kept in a VCS.

The interesting bits about this machine are home directories, the VCS repositories from the other three machines mentioned above, the web content and the MySQL databases I host (mostly for web sites).

For the home directories, I don't guarantee anything to my users. I backup my own directory to my file server at home. I can't do this as often as I'd like to since it's hard on bandwidth and the file server is behind a meager DSL line, but better to have at least some backup that's restorable. Since my mail system puts user mail into their home directories, backups of /home include all mail. I specifically tell my users that I don't backup their home directories, so they're on their own.

The web content I handle exactly the same way as home directories: regular rsnapshots locally, copied over to the file server. I have excluded a few directories that contain big binaries that are merely copies of stuff found elsewhere.

The VCS repositories are backed up locally by the usual backup methods for such systems. I keep a week of backups that are made every six hours. For the MySQL databases, I make a backup every three hours, look if it differs from the one before and discard it if it doesn't. I keep a week of those, too.

These local backups are rsynced once a day to the file server again (early in the morning so as to not influence normal surfing at home.

My co-location offers backup solutions, too, and I recommend using them if you can afford them. Since I've always had my own backup-to-file server solution, I never bothered to set it up.

Conclusion

As you can see from the sheer length of this article, I've put quite some thought into my backups. I'm also an IT professional so I tend to do everything myself (we geeks are weird that way). I'm sure for the less technically inclined, there are easier "turn-key" solutions than mine.

Still, my main point of this article isn't talking about the exact mechanisms used but rather, what kind of things are possible. Also, I hope to have illustrated some of the pitfalls of backups, like making them too seldom or making them in the wrong place. Naturally, knowing what data you will need in the event of a catastrophe is important, too. Yes, your irreplaceable stuff is important, but not having to sink weeks into getting back all of your personal setup is nice to have, too. Significant data loss is almost always going to make a serious dent in your productivity - try to minimize it.

In essence: be aware what your important data is (and what isn't), be aware what a good place is to put backups (a distance away, but not too far) and be aware of the way your restore procedure works. The last bit I'll talk about in more depth in a future BAD post (nice acronym, isn't it?).


Posted by klausman | Permanent Link | Categories: Software, Community
Comment by mail to blog@ this domain. Why?