2793613005_8d84458623.jpg

Information Grab: The New Archive, part II

Part II of "Archiving in the 21st Century." Read parts I, III, and IV.

by Anais Borja

Archivists have a new digital charter and as they begin to define the boundaries or margins of our digital heritage, they are not only concerned with what you donate, but what material can be gleaned from social media sites such Facebook and Flickr. New methods of acquisition rely on software “harvesters” which crawl the Internet and select random moments on the Web to conserve.

The International Internet Preservation Consortium, established in July 2003, was among the first centralized and international consortiums to develop an agreed-upon membership-based protocol for “managing and sustaining at-risk digital content” at the domain level. Its members include the Internet Archive and the Library of Congress with their broad harvesting methods, as well as those more traditional national libraries such as The Webarchief van Nederland (Web archive of The Netherlands) whose selection criteria relate to Dutch history, language, culture, and science. 

One of the most broadly defined collections is the monolithic Íslenska vefsafnið (The Icelandic Web Archive) which “contains all web sites hosted on the Icelandic domain .is and many web sites hosted elsewhere that are in Icelandic or refer directly to matters of interest to Iceland.”

But the question for some is not how to ensure the preservation of digital objects and material but how to decide what’s worth preserving in the first place. As a lot of material on the web is unstable, dynamic and its life expectancy short, the practice de jour that leaves “appraisal” to the web crawlers and automatic indexers is a workable if imperfect solution.

The so-called “deep web” or DarkNet which cannot be saved by crawlers makes the Internet Archive’s lofty hope to provide “universal access to all knowledge” an ambitious if unrealistic promise. With technological advances these technical problems might seem quaint and the ability to save inevitable.

However, this ever-growing suspicion of what has come to be referred to as an “illusion of completeness,” articulated by Viktor Mayer-Schönberger’s Delete: The Virtues of Forgetting in the Digital Age, is essentially a response to this Age of Digital Consolidation. While Delete describes the Internet in much the same way as Vannevar Bush did the proto-Internet he called the “memex” in his essay about data management “As We May Think,” Delete signals the dangers of defaulting to what archivist Inge Bundsgaard called "social memory" without the "right to social oblivion." Where Bush describes the mechanism as “an enlarged intimate supplement to his memory” that easily and cheaply facilitates exchange between the other, Delete describes the dangers of everlasting digital memory, of outdated information taken out of context. As the book argues, information privacy rights alone won’t solve the problem, but a mechanized “appraisal” of expiration dates on information might.

The aggregation of these discrete appraisal choices represents what John Ridener described in his historical primer, From Polders to Postmodernism: A Concise History of Archival Theory,  as the  “social dimension of preservation” that will not only help determine the shape of our archives but how researchers access them. Where archives are concerned, writers by virtue of their craft and temperament tend to think about their literary immortality in terms of what archivists call, rather unglamorously, “preservation by proliferation.”

While many spurn self-conscious retention and assiduous appraisal of their records as evidence of the most peculiar kind of ego, the appraisal instinct is perhaps most evident when writers/users take equal precautions to protect their private lives with expert edits of their personal artifacts. As a consequence, writers are often more self-conscious about their archival habits. They tend to preserve drafts and correspondence as evidence of the creative act, and as a practical editorial habit of reference. But, then, like everyone else, writers throw a lot away. Noted archivist/collector Walter Benjamin referred to the verzetteln or “fragmentary, unachieved” scraps of his literary archive as both the record and method of managing it. Despite a degree of self-consciousness, verzetteln habits of writers have tended to shape their archives.

For the most part, these habits haven’t changed much (Zadie Smith admitted her shortsightedness as much recently). But the mechanism of literary production, and how it is preserved, have. According to the 2010 landmark Digital Lives research project led by the British Library, interview findings revealed that 63 percent of the academics and digital public samples claimed the main reason for archiving computer files is “as witness to creativity” and only 15 percent archived them for "sentimental reasons" and 26 percent for "personal reference."

Archivists at the same conference were also concerned about the apparent confusion users/writers have about their rights to download personal content easily from online service providers like Yahoo, Gmail and Facebook for the sake of its long term preservation and security. Writers who may be naturally shortsighted about the enduring value of their archives are now just a frustrated keystroke away from discarding drafts, a sloppy tap of the mouse from deleting entire chunks of manuscripts, or losing e-mail correspondence as a malfunction, technical obsolescence or simply a changed account.

If, for arguments sake, we ignore the wonders of computer forensics and recovery, the computer is a vast repository of arcane subdirectories that by virtue of its design hide what many of us may have already forgotten. Sometimes our writer/user habits can invisibly influence the habits of our literary production. For instance, Nabakov made it famously clear that all unfinished work should be destroyed, and perhaps had it not been for his wife Vera and her important role as typist/executor/archivist/interventionist, he might have swiftly done the deed to both Lolita and Laura himself if he had written them on a computer (forensics and recovery notwithstanding). Glenn Horowitz describes this possibility as “a radical change that will shape future literary archives.”

If the records creator is responsible for what is ultimately transferred to an archive, it is clear that maintaining forward-looking recordkeeping habits is crucial, especially in a born-digital environment. According to InterPARES, the international initiative to ensure the long-term preservation strategies of electronic records, digital records are authentic when  “the materials are what they purport to be and have not been tampered with or otherwise corrupted.” At the conclusion of the first InterPARES, it was determined that, unsurprisingly, artists and writers are less concerned about preserving the by-products of their creative act than the act itself.

In fact, the Beinecke, Yale’s archival institution, has adapted InterPARES 2 Creator Guidelines to distribute to its authors as a resource. Archivists talk quite openly about approaching writers at younger and younger ages, driving cross-country to visit and interview them in order to capture contextual information that aids future research. A selection of the ten technical, and somewhat stringent, forensic guidelines range from sensible recommendations like choosing hardware, software and open file-formats (they offer six factors that can help you choose wisely) to meticulously rich metadata (data about content–so a computer can understand it).

If a writer takes the precautions to ensure that records are stable and fixed, the idea is that their fixed form cannot be overwritten, altered, or deleted. However, as Horowitz related to me on the phone, “stable content” has always challenged archivists. In 2007 the Ransom center acquired the Norman Mailer Papers which, according to their website, included “359 computer disks, 47 electronic files, 40 CDs, six mini data cartridges, three laptop computers, and one Ampex magnetic tape spool.” The bulk of the electronic content was created by Mailer's assistant, Judith McNally, and consists of correspondence or literary drafts. Gabriela Redwine, Curator of Born-Digital assets at The Harry Ransom Center, relates that when they interviewed Mailer in 2007, they determined that the laptops and disks were entirely in Judith McNally’s possession and care. When McNally died, basic issues of provenance were raised when all of the disks were seized and accessioned over a year after her death.

The problem is not simply the issues of stability and integrity of the records when the computers were in McNally’s possession, but that there is no way to know who may have viewed or changed the records after her death. If authenticity, according to InterPARES 2, is “maintained by the artist’s constant presence,” the complications of “authenticity” arise in most collaborative working relationships today.

As Matthew Kirschenbaum–the lead researcher on the National Endowment of the Humanities’ sponsored paper “Digital Materiality: Preserving Access to Computers as Complete Environments”–related to me over e-mail, the very nature of how information is stored on a computer's hard drive, the system of organization, is important to the archivist. If our literary lives are increasingly digital, he says that he,“can easily imagine a future scholar wanting to know what other files were in the same directory, in the same way we would use location and proximity as one metric by which to evaluate an item's place in a physical collection. While directory paths can obviously be captured as metadata, that's still not the same as the visual experience of actually looking at the original document and folder.”

For Kirschenbaum, born-digital information experienced in the same way as the writer is "not simply a record, but an artifact.” 

Part II of "Archiving in the 21st Century." Read parts IIII, and IV.

photo by Dan Machold, via Flickr

Comments are closed.