Memo to Washington: Save the Data.

The National Archives’ lack of speed in preserving digital federal records, existing in 16,000 different formats, could lead to serious data losses.

Jul 1, 2005

If you wander along the National Mall in Washington, DC, you can pop into the marble rotunda of the National Archives for a glimpse of the original Declaration of Independence, Constitution, and Bill of Rights. These calfskin parchments are preserved under glass, bathed in argon gas. But no such care is extended to digital federal records. The government is presumed to have used (or received) data in every format ever crafted by the computer industry – some 16,000 formats at last count – and has stored this data on every kind of hardware. But the fast-changing computer industry never stopped to think about long-term preservation, which means records of contemporary history are fast becoming obsolete – and there’s no existing system to permanently and reliably archive them.

That’s beginning to change, as our story “The Fading Memory of the State,” reports. The U.S. National Archives and Records Administration (NARA) is in the early stages of developing an Electronic Records Archives that will harmonize and preserve all these digital records and make them available online, so saving the nation’s contemporary history from destruction. Solving the problem will in some ways test the limits of computer-science research: NARA must not only preserve every data format ever dreamt up but contend with a volume of material that far exceeds that of even the largest private enterprise. What’s more, it has a responsibility to save all this data for the uniquely long (if ill-defined) time period NARA calls “the life of the Republic.”

Like any other federal agency, NARA is saddled with a cumbersome procurement process. It has hired two major contractors – Lockheed Martin and Harris Corporation – to generate competing preliminary designs, which are scheduled to be unveiled next month. Common sense suggests that the project will need close and continuing scrutiny from the U.S. Congress – and from the National Academies panel of industry and academic experts that has been advising NARA. The goal: to ensure that the resulting digital archive is not a rigid, custom-built system doomed to obsolescence but rather a flexible system grounded as much as possible in commercial offerings and able to evolve with the IT industry. As a good start, NARA could model its digital archive after early versions of digital archives already built by some nations and academic institutions, including MIT.

Clearly, the general problem of digital-record decay needs more attention than it’s currently getting in Washington. Yes, $136 million has been budgeted to date for NARA’s digital-archive project, but not enough has been done to actually force federal departments to harmonize how they store digital records. And some shortsighted cuts can be found in the administration’s proposed 2006 budget – specifically, eliminating $10 million in funding to the National Historical Publications and Records Commission (NHPRC), a grant agency within NARA that supports research in digital archiving and curation. That cut would effectively kill the NHPRC. Yet this is just the kind of small, nimble program that can help find ways to stanch the bleeding of the contemporary historical record. Congress should take the problem more seriously.