November 20, 2018 • Zenobia Kozak
To mark the 900th anniversary of Domesday Book, in 1986 the BBC created a computer-based, multimedia version of the publication. The data was stored on two interactive video discs at a cost of approximately £2.5 million. The problem? By the late 1990s, these discs could no longer be read by computers because the format was outdated. It is ironic that the original Domesday Book survived for nearly a millennium, but the modern version was only available for a decade and a half.
A team at Leeds University was able to resurrect the lost data in 2002, but the Domesday Project has become synonymous with the problems of digital preservation. How could Britain’s earliest public record so easily outlive its digital facsimile? More specifically, how do you safeguard your digital information against such an expensive, catastrophic event?
As of 2007, 94 percent of our corporate memory was in digital form, and experts warn us that we’re on the brink of an information “dark age.” Despite nearly 1,000 years of technological advances since the creation of Domesday Book, are paper records the only way to ensure long-term preservation and access?
In 2015, internet pioneer Vint Cerf warned us to start preserving our vast amount of digital data. The “Father of the Internet” advised a San Jose conference that the 21st century could thrust us into darkness because so much data is in a digital format.
If you think about the quantity of documentation from our daily lives which is captured in digital form, like our interactions by email, people’s tweets, all of the world wide web, then if you wanted to see what was on the web in 1994 you’d have trouble doing that. A lot of the stuff disappears.
The main challenges organizations face are the very reasons why the digital black hole exists: data volume, format and associated metadata.
The cost and complexity associated with long-term preservation are increasing as more and more sources of data emerge in need of preservation. Preservation software or storage systems may be unable to handle such things as large files, multiple versions of the same file, or just large volumes.
In 2010, the Library of Congress launched a project to effectively archive Twitter, beginning with 2006 through 2010, and continuing with all public tweets thereafter. Between 2007 and 2017, Twitter went from 5,000 tweets per day to 500 million. The Library of Congress recognized that the social media landscape had changed significantly: “It is clear that technology to allow for scholarship access to large data sets is lagging behind technology for creating and distributing such data.” The Library has now started to acquire tweets on a selective basis. Despite more than 200 years of wisdom and resources, the sheer volume of digital information associated with the Twitter project proved too much for the Library of Congress.
Digital vs. Analog Preservation
As our digital output increases, our preservation needs change. The need for timely and effective preservation is more urgent because digital materials have a shorter lifespan. Newer generations of software may phase out support for older formats. One of the major differences between digital and analog preservation is that digital requires more active intervention throughout a material’s life cycle, and at a much earlier stage.
For example, one best practice is to apply thorough and accurate metadata to digital content from the outset, and continually assess the integrity of the file and functionality of associated software. It seems counterintuitive that a process involving digital files would require more frequent attention and action, but the real danger is in the assumption that there is an automated process in place. Additionally, it’s not just a matter of preserving the digital files but also providing ongoing access to the material, ensuring the software and hardware necessary for reading the files remain available and operational. As the BBC discovered with the Domesday Project, even if your preservation format seems indestructible, you still need to take into consideration the software and hardware required to access that media.
Metadata may be the most important and complex aspect of digital preservation. The supporting data associated with a digital file serves as a timeline and a road map. Missing or poor metadata makes material undiscoverable, authenticity unverifiable, and context unclear. Long-term preservation may not be beneficial or even possible for digital material lacking sufficient metadata. Inadequate metadata is probably the most common and pervasive concern leading to the loss of digital data into the proverbial “black hole.”
Closing the Gap
How do we know what’s at risk and what we’ve already lost? Data-gathering initiatives and digital forensics tools like BitCurator and Digital Record Object Identification (DROID) are helping archivists discover digital data and recover deleted, encrypted, or damaged file information. BitCurator provides tools and techniques to extract technical and preservation metadata as well as package digital materials for archival storage. DROID is a software tool developed by The National Archives of the UK that will profile a wide range of file formats. DROID indicates file versions, their age and size, and when they were last changed.
Once the material is discovered (or recovered), the next step is to ensure long-term preservation. Like digital forensics, long-term preservation requires a specialized set of tools. The Digital POWRR Project, sponsored by the Institute of Museum and Library Services, works to make digital preservation more accessible to a wider range of professionals. In 2013, the group compiled a useful tool grid, listing and comparing commercial and open-source digital preservation tools.
Fill the Gaps
Digital forensics and digital preservation typically rely on the intervention and involvement of archival specialists who are trained in the organization and processing of digital-born content.
Apart from purely digital initiatives, discovery campaigns can also help to fill the gaps where digital material is lost or lacking. Discovery platforms enable any user within an organization to nominate digital material to a collection, which is secure but open for collaborators to add to or enhance. It is often surprising to “discover” just how much history can accumulate when authentic content is globally distributed and squirreled away in personal repositories. These exercises can serve as excellent engagement tools for an organization.
Unlocking heritage through oral history projects is another method for filling in gaps in the digital record. Gathering, preserving and interpreting the voices and memories of significant personalities in an organization helps us to capture the culture and character of an organization by remembering important people, communities, milestones, crises, and turning points.
Capturing “near history” through these methods helps to carry forward important history to inform contemporary business issues and to resonate with diverse stakeholder groups. So, what are you doing to avoid the digital black hole? Contact us today to get filled in!
Virtually every executive who lives in this Big Data world asks three core questions when… Read More