A View from Simson Garfinkel

How to Cope with an Expanding Data Closet

It’s easy to get trapped by your data. Create a personal storage management plan.

This holiday season you’ll probably be tempted to pick up another 2TB external for your home computer and a bunch of 32GB USB3 flash drives for stocking stuffers. Or perhaps you’ve decided to trust your data to the cloud—$100 will store 200GB for 10 months on Google Drive. It’s never been easier to save every photograph you snap, every e-mail message you send, and every bank statement you receive. It’s much harder to sensibly manage the ever-expanding archive of personal information that most of us are creating.

We live in a world in which our digital closet space doubles every year, so we never need to throw anything out. But if we don’t learn how to manage our storage space, we’ll never be able to find anything, either.

What’s more, even most non-technical people I know have a growing inventory of storage sitting around their homes. Typically there’s a drawer of 1GB and 2GB USB flash drives—it seemed like a lot of space at the time—camera cards, old CDs and DVDs, and usually a few external drives. What’s on those drives? Can they still be read? Are they safe to throw out? Who has the time to find out!

So before you click that button and buy more storage, it’s useful to take a few minutes and create your own personal storage management plan. This involves writing down what you are storing, the threats that your data face, and the technical measures you are employing to protect the data. Without such forethought, there’s a good chance that you’ll end up losing irreplaceable data in this age of plenty.

The Content

The first step in any data management plan is identifying what you want to save. For most consumers this includes digital photos, video, and e-mail. For me it also includes scans of paper records that once filled my basement, as well as PDFs of bank statements —roughly 10,000 PDF files that require 5GB of storage—a lot of data back in 2006 when I finished my scanning project, but not so much now.

People should also think about backing up information that’s stored in the cloud. The problem with cloud-based storage, though, is that these services are optimized for getting information into them—not taking it out. Worse, companies are under no obligation to preserve your data—and even if they were, accidents happen. (Amazon’s cloud has lost customer data on several occasions.) Consumers can also close accounts or lose access to them—for example, if an account gets hacked.

Banks and credit card companies are pressuring their customers to sign up for what’s called “electronic statement delivery.” Unfortunately, the statements aren’t actually sent to you by e-mail. Instead, these services provide for electronic retrieval—customers need to go to the financial institution’s website and download the statements. Typically statements must be downloaded one at a time. Banks typically terminate your access to historical statements, when you close your account, so it’s a good idea to download them every six or 12 months and store a copy locally.

Most of what’s on Facebook probably isn’t worth preserving, but some is—especially photos and letters from loved ones. Downloading this information in a systematic way requires special-purpose software. For example, “Backup Plus,” which comes with Seagate’s external drives, can automatically and archive photos you post to social network sites from your phone. You can also accomplish this with the Dropbox and Google smart phone apps, which can be configured to automatically copy each photo that’s snapped to a special folder. Apple’s PhotoStream provide similar functionality, but with less user control.

To archive Gmail and other Web-based e-mail you’ll need to configure a desktop computer to access your mailboxes with IMAP. Once the mail is synchronized, copy it to a local mailbox or create a mail archive. This is also a great way to decrease your storage footprint on the Web-based services, which might allow you to keep using the free service rather than paying for more storage space.

Containers and Backups

Once you have identified your data, you need to decide where you are going to keep it.  These days it’s common for data to be scattered across multiple devices and services. Unless you are careful, it’s easy to get version skew or to inadvertently delete the only copy of something because you think that you’ve got a copy elsewhere. And remember, every piece of computer equipment will eventually fail.

To avoid this problem, I recommend having a primary system where you keep all of your data and that is properly backed up. Other phones, tablets, laptops and desktops should be configured so that the important content they generate gets automatically migrated to the primary. This is the approach that Google recommends for Chromebook users, although the primary is Google’s cloud.

It’s common for organizations to divide their data into active and archive. Active data needs to be immediately accessible and changes frequently, while archive data changes slowly and is rarely delayed. For home users, I recommend using services like Google Drive or Dropbox for active and storing archive on at least two external drives—one of which is unplugged and in a different physical location. Being unplugged protects against malware like CryptoLocker (which encrypts your data and demands payment by Bitcoin), and being in a different location protects against fire and theft. If physical security is lacking where you store that remote drive, you may want to also use drive encryption like Apple’s FileVault, Microsoft’s BitLocker, or the open source program TrueCrypt to protect your data when you are not around.

Academic research has shown that user error and software bugs are two of the primary threats to stored data. Multiple offline backups are the best protection from these threats, but you need to test the backups every now and then to make sure that they are still usable.

It’s easy to get trapped by your data. You make backups from one machine—for example, your laptop—to an external drive. You delete some files on your laptop but you don’t worry, because they are on the external. Then you realize that you only have one copy of the files on the external drive, so you buy a second external drive to back up the first.  You want to protect yourself against a house fire or burglary, so you buy another drive, copy your data to it, and take it in to work. You now have backups of your backup’s backup. This way lies madness.

Avoid information insanity by knowing the location of your data and the specific purpose of each backup. It makes no sense to backup a backup. Instead, make two backups of your original and put them in separate locations. Know which of your machines can be wiped and restored from backups (or from the cloud) and which machines have data that really matters. Ideally no machine has data that are irreplaceable, and everything that’s on a USB stick, camera card, cell phone, or tablet should be copied to a laptop, a desktop, or the cloud shortly after it is created.