Archiving in the 21st Century

Part I of "Archiving in the 21st Century." Read parts II, III, and IV.

by Anais Borja

We have promiscuous habits of self-documentation. Facebook hosts more than 40 billion of our photos and threads, Twitter publishes more than 50 million of our aperçus each day. From the French mer-mer, to “vividly wonder,” the Internet meme propagates these cheap modes of expression through its bit-sized transmissions, which the Library of Congress then catalogues, exhaustively. In a sense, when the Internet evicted TV from the low rent tenement of mass culture, we were put on the lease. Today, the meme is the message.

Traditionally, archivists and historians have had the professional mandate to appraise or assign value to our literary endeavors. Their professional influence determines not only what we keep and what we destroy, but, more importantly, how we save and access it. Archival repositories today acquire an ever-increasing mass of material that arrives in an electronic format, what’s known as “born-digital.”

These blogs, ejournals, Tweets, virtual worlds and Facebook threads represent “gray literature,” a shifting boundary blurring the distinction between the professional writer, the amateur and even the dilettante. While digitization is nothing new to archivists, the fundamental shape and texture of the archive is evolving. As the digital information economy drives more research based user-demand from remote points of access, and archivists and historians revisit appraisal criteria that can accommodate broader definitions of what belongs in the canon and what doesn’t, the definitions of our cultural and literary heritage are changing.

The archival mandate and the new terms of its appraisal are increasingly shifting to us, to our virtual lives—the fugitive margins of our literary biographies.

As Glenn Horowitz, a rare book dealer and literary appraiser, opined to an audience of archivists in 2008 at the esteemed University of Texas’s Harry Ransom center conference Creating a Usable Past: Writers, Archives, Institutions, institutional acquisition strategies are shifting away from a focus “exclusively on writers that were either canonical figures or seen to be flirting with the possibility of rising into the canon.” The archival mandate and the new terms of its appraisal are increasingly shifting to us, to our virtual lives—the fugitive margins of our literary biographies.

According to Eric Schmidt, CEO of Google, “Every two days now we create as much information as we did from the dawn of civilization up until 2003. That’s something like five exabytes of data.” An exabyte is roughly equivalent to 50,000 years of DVD quality video, so consuming all that information would feel like watching the Matrix 193,235,294 times. With easy storage made even easier by cheap disk space, our ability to create and save information has outpaced our ability to think critically about the theory and practice of archiving it.

It’s true that the data revolution is relative: from the printing press to the telegraph, the typewriter to the copy machine and the PC, processing new data in different forms has always challenged information professionals and historians. But the history of archival theory is the history of appraisal—a modern archive is not simply a repository but according to The Society of American Archivists a “contextually based organic body of evidence” that adheres to well-defined acquisition policies and appraisal strategies based on modern archival principles. Some archivists have called it their “hierarchical expertise” that imposes order on data and finds meaning from context.

The most fundamental archival concept, the respect des fonds, requires that the provenance and context of the archival record remain intact from origin to successive owners to the historical and functional context in which it was created. The Ransom Center is perhaps the most aggressive among the traditional repositories or heritage organizations that acquire preserve and provide access to collections of unpublished correspondences, photographs, diaries and ephemera like notes and marginalia. While the terms and stakes of this revolution have not changed, the players—and how they play—are beginning to. And the new rules of the game apply, broadly, to all of us, especially if we’re writers—and these days, who isn’t?

There are a growing number of professionals who think that the future of the Canon, its very preservation, depends in part on maintaining the integrity of born-digital literary archives as artifacts in a complete environment.

If user habits change as storage capacity increases, the future of the Global Canon will be outlined in the shape of our digital 21st-century archives. How we preserve and provide access to this “usable past” will not only require forensic strategies to ensure the long-term integrity of our literary heritage, but the balance of power about who interprets the past will shift, too.

Furthermore, if the total digital context for individual records is crucial to the preservation of digital archives, there are a growing number of professionals who think that the future of the Canon, its very preservation, depends in part on maintaining the integrity of born-digital literary archives as artifacts in a complete environment.

To understand the future of archives one must first understand that underlying every archival act, whether at the institutional or personal level, there exists the tension between social memory (a rich “complete” historical record) and the right to be forgotten.

Historically the appraisal methods applied to objects of a personal collection are less well-defined than those applied to government or corporate records, in part because there is comparatively less to appraise and the transactional or evidentiary value of a personal record itself is less obvious. However, born-digital personal and literary collections are changing that ratio, as the volume of what could potentially have enduring value is measured not in stacks but in exabytes.

When the archival profession first codified its best practices, during the Consolidation period in the Netherlands in the 19th-century, the archival record was narrowly defined as documents that “serve as evidence of what is mentioned in them.” The Dutch Manual, the basis for North American archival theory, was the specific and practical guide for objectively trained Dutch archivists who were, in schooling, historians.

Since the archive itself was a well-defined repository documenting “history as it happened,” their focus was to standardize the description and arrangements of records, without the need for appraisal. Archives were arranged chronologically, and were usually lists of administrative documents. Not until after WWII did the archival paradigm admit a more subjective viewpoint.

With evermore sophisticated tools, the ability to track our lives and the lives of others offers an opportunity to examine our responsibilities, or lack thereof, to the historical record. It’s essentially a question of appraisal.

The contemporary “post-custodial” archival paradigm operates under the assumption that, confronted with vast amounts of digital data easily tamed by Google-like algorithms, we don’t need to think about appraisal. According to the influential archivist Terry Cook, only about 1 to 5 percent of major institutional and governmental documentation is archived. And an even smaller percentage comes from the “totality of records of all possible private citizen, groups, and organizations.”

All that is changing as the nature of information sharing evolves, and who owns the information and how we access it is increasingly muddied. In a sense, the current archival paradigm looks a lot like it might have to a 19th-century Dutch archivist, except its archives are measured in exabytes, not pages.

Though most of it may exist (for now) outside an official long-term repository, digital ephemera and its controversial appraisal (what we save, why, and how) dramatizes the question of appraisal in the digital age of information exchange and over sharing. Because we can potentially save it all, how we access it and why we do will affect the future of literary production in the digital age.

It may no longer even be true that archives “hold singular information not duplicated elsewhere.” With evermore sophisticated tools, the ability to track our lives and the lives of others offers an opportunity to examine our responsibilities, or lack thereof, to the historical record. It’s essentially a question of appraisal. And suddenly the historical record becomes a metaphor of access: how we access the record is as vital to the future literary heritage as who provides access to it.

Writing on the web is a relatively new historical record, and its democratizing effects on authorship are a relatively new phenomenon. Soon Gawker will no longer be a “traditional” blog, and today magazines are embracing the tradition of the blog popularized by sites like Gawker and Huffpo. But beyond site design and the blurring editorial mechanisms of how something gets published, even the notion of what it means to be published has shifted when you consider that every e-message you send could leave a trail of copies on as many as a half a dozen servers and routers before reaching its destination.

The author Sloane Crosley sent a single email “to a lot of different people” about getting locked out of her apartment twice in one day, and it launched her career. Section 108b of the copyright act states that libraries and archives can make preservation copies of what it deems “unpublished work.” How can the Library of Congress or the Internet Archive, which uses automated systems to crawl the web every few months to harvest websites for archiving, not need the permission of content creators or copyright holders to do so?

Just this sort of third-party, user-based interactive behavior in virtual worlds complicates the very notion of authorship. And if the integrity of an archive depends on context, provenance and, especially, access, what does the preservation or access to electronic and born-digital literature look or feel like, and how does our contribution to the archive affect the ownership and intellectual property of our eManuscripts? In other words, if in our digital lives the archive is more experiential than strictly documentary, where does the experience begin and end?

Up next, the illusion of completeness. Read parts II, III, and IV.

photo by missmass, via Flickr