Through a Digital Glass Darkly: Early English Books Online

Introduction

The digital artifact known as Early English Books Online (EEBO) is a resource for research on British history and literature between 1473 and 1700. EEBO is a collection of 146,000 mostly English works accessible via an online database, available by subscription from ProQuest. The works are presented as images scanned from microfilm, with bibliographic descriptions drawn from a variety of previously existing catalogs, and a number of associated transcriptions in SGML/XML format to facilitate text search within the books themselves. EEBO is regarded by many as “an essential part of the early modern scholar’s toolkit,” and “transformative for scholarship.”1 It is also regarded by some as flawed in important ways, due to a series of transformational processes used to convert the corpus from its origins as actual physical products of print to eventual expression as a large set of structured digital objects. In this article I first review the history of EEBO, which began with cataloging efforts more than a century ago, through the processes that developed the online version used by so many scholars today. I then critically review its limitations, and discuss some of the challenges and drawbacks inherent in the transformation of analog source materials into digital form, including information distortion and loss, format obsolescence, and the challenges of digital preservation.

A History of Early English Books Online

Today’s EEBO, managed by ProQuest, draws from a variety of collections and sources. The genesis of the project is credited to Eugene B. Power (1905-1993), an early pioneer in the microfilm industry. According to his biography, Power believed microfilm could become an ideal medium for scholarly research. In 1938 he launched a microfilm company, University Microfilms International (UMI), and arranged to begin photographing British books printed prior to 1701 (later much expanded) using a microfilm camera he invented. Copies of the microfilm products were then sold to research libraries. Later in his career as an entrepreneur, Power became interested in printing on-demand, or what he termed “an edition of one.” He contributed to development of a Xerox copier for microfilm, and in 1962 the Xerox Corporation acquired UMI.2 3 

Power’s UMI microfilm initiative was launched as a preservation project, stemming from concerns over a German invasion of the United Kingdom. Initially it included books from the Short-Title Catalog of Books Printed in England, Scotland, & Ireland and of English Books Printed Abroad, 1475-1640, compiled by Alfred W. Pollard and G.R. Redgrave. But it quickly became clear that scholarly access might transform historical research on English literature and history. The microfilm collection became known as Early English Books (EEB), and was purchased by academic libraries around the world as a resource for research in history, English literature, and other subjects. “The prospect of having copies of nearly every book printed in England made EEB a ‘must have’ collection for campuses around the United Statres,” writes Shawn Martin in his historical account of EEB.4 

As Bonnie Mak details in her study of EEBO origins, the timing of World War Two was fortunate for UMI’s ambitions. With support from the U.S. Library of Congress and the American Council of Learned Societies, and a Rockefeller grant of $30,000, Power was able to travel abroad and continue microfilming throughout the war. He also assisted in the allied war effort by microfilming materials for British intelligence and the U.S. Office of the Coordinator of Information (COI) headed by General William Donovan – the organizational predecessor to the U.S. Central Intelligence Agency. The U.S. government provided Power with state-of-the-art microfilm and production equipment for work on behalf of the military, he which he also used for his EEB project. These resources and connections positioned UMI to prevail over larger companies like Kodak in bidding for additional military contracts. Thus, the success of the EEB was connected with the war effort in important ways.5

EEB Antecedents and Metamorphosis

As mentioned, Power’s EEB project drew from book in the Short-Title Catalog (STC) compiled by Alfred W. Pollard and G.R. Redgrave. That project began at a 1918 meeting of the Bibliographic Society, where Pollard and Redgrave floated the idea of a  “short-title handlist,” and in 1926 culminated in the publication of A Short-title Catalogue of Books Printed in England, Scotland, & Ireland, and of English Books Printed Abroad, 1475–1640, referred to as STC or sometimes “Pollard and Redgrave.” The first edition STC relied on sources from the British Museum (British Library), Cambridge University, the Bodleian Library at Oxford University, the Huntington Library in California, and more than 150 other collections. A second edition was published after 1976, drawing from additional collections. It included only books printed in the English Isles or colonies, or if originating elsewhere, printed in the English language. “Very large numbers of foreign-printed Latin books imported into England from the fifteenth centuries onwards are not to be found in STC,” writes Ian Gadd, “meaning that one cannot read STC’s contents as a full representation of Britain’s print culture prior to 1641.”6

In 1957 UMI began adding to EEB books from the collection compiled by Donald G. Wing, a Short-Title Catalogue of Books Printed in England, Scotland, Ireland, Wales, and British America, and of English Books Printed in Other Countries, 1641-1700 (sometimes referred to simply as Wing).7 Later supplements to EEB included the Thomason Tracts, a collection of printed pamphlets, books, and newspapers, printed mainly in London between 1640 and 1661, compiled contemporaneously by George Thomason, a London bookseller and associate of John Milton.8 Another EEB release included the Early English Books Trace Supplement compiled by the British Library, consisting of broadsides, pamphlets, letters, ballads, almanacs, auction catalogs, scientific treatises, proclamations, and other pubic documents from 16th and 17th century England.9

EEB was the first project to create a collection of microfilm images of English printed materials, but at the British Library a different project proceeded nearly in parallel. In 1983, the Library published on microfilm The Eighteenth Century Short Title Catalog, and in 1996 released a CD-ROM edition. Known as the ESTC, it was intended as a “machine-readable union catalogue of books, pamphlets and other ephemeral material printed in English-speaking countries from 1701 to 1800” listing every known copy of books from various libraries.10 The ESTC later incorporated earlier collections, including the catalogs of the Pollard and Redgrave STC and the Wing STC, resulting in a catalog of some 480,000 surviving books printed before 1801.11 The ESTC remains freely accessible through the British Library website as a “a comprehensive, international union catalogue listing early books, serials, newspapers and selected ephemera printed before 1801.”12 As we will see, the ESTC now has a relationship with EEBO that adds value to the scholarly record while also potentially adding scholarly confusion. 

In 1998 EEB began a radical transformation to new life at EEBO, as UMI embarked on a project to create digital images of its microfilmed book titles from the Pollard & Redgrave and Wing Short-Title Catalogs, the Thomason Tracts, the Early English Books Tract Supplements. In 2003 the electronic publishing house Chadwyck-Healey (now part of ProQuest) launched the first EEBO online portal, providing access to the newly-digitized book images for EEBO subscribers. By 2005 more than 100,000 books extending back to 1473 were available online.13

While EEBO was rapidly building its digital holdings, another collection of digitized English books was created in 2003 by Thomson Gale, known as Eighteenth-Century Collections Online (ECCO). Available by subscription from Gale Engage Learning, ECCO contains some 32 million pages from 180,000 titles, with relatively high-quality page images made from microfilm masters, MARC metadata records, and full-text search. Through an agreement with ProQuest, subscribers to EEBO have access to ECCO and can cross-search both collections.14

Yet another collection of English book records deserves mentioning here, although my research thus far has failed to clarify its connection, if any, to EEBO. In 1884 Gregory W. Eccles and George Bullen published a three-volume Catalogue of Books in the Library of the British Museum Printed in England, Scotland, and Ireland, and of Books in English Printed Abroad, to the Year 1640. It includes only books contained in the British Museum (British Library). The collection is freely available on the Hathi Trust website as a set of MARC records, and in digital formats including downloadable PDFs, plain text files, and jpeg images.15 

With so many catalogs and collections of English books in various print and electronic formats, and so many versions in the case of EEBO, a comprehensive comparison of cross-references and duplication of records is beyond the scope of this article. (See Appendix Table 1 for a timeline and summary of Early English Book Catalogs and Collections.) However, it seems important to emphasize the diversity and depth of this forest, and the corresponding likelihood for the casual wanderer, and possibly even the seasoned scholarly explorer, of getting lost in the woods.

As it turns out, additional pitfalls await for researchers who seek a full appreciation of EEBO, for better or worse. In the next section I provide a critical review of important steps along the way from printed pages to digital formats and online finding aids. I find that the evolution of EEBO has not consistently been kind to the scholarly record, nor faithful to the fidelity of the original texts.

EEBO, for Better or Worse

As noted above, EEBO came into being by virtue of a digitization project begun in 1998 by UMI. Coincidentally, 1998 was the year I began digitizing public television programs and video projects for streaming on the World Wide Web. The process required an analog-to-digital conversion device to capture the source content from videotapes as digital files on a standard desktop computer. The capture device compressed the audio and video data so as to decrease the resulting file size, which was still quite large in terms of typical hard drive storage sizes of the time. It also slightly reduced the source’s luminance, and introduced other visible artifacts likely attributable to its budget-conscious design. Of course the source videotapes also reduced the original “live” video and audio quality, given that standard definition television uses 525 interlaced lines to “paint” moving images on the then-standard television screen. Standard definition television (NTSC in North and most of South America, PAL in Europe and Asia) is a compromised format all around, and capturing it to a digital format can easily compromise it further. 

But that compromise is nothing compared with the abomination-level “quality” of streaming video in the late 1990s. Most people had access to very limited Internet bandwidth; telephone modems were still common especially in rural areas. We did the best we could given the constraints, which was typically a streaming video format with pixel dimensions of 320×240 and a total bitrate of 128 kilobits per second. For comparison, video that is uncompressed (i.e. digitized at its original quality, making it unplayable on the web) would stream at more than several megabits per second, depending on the source format, and up to almost 24 gigabits per second for today’s 4K ultra-high definition video formats. Due to consumer bandwidth limits, video today plays on our televisions, desktop computers, and mobile devices after being compressed to fit the available bandwidth, which means using clever algorithms implemented in software to literally throw away information. This is called “lossy” compression, because it intentionally loses data. Once the data is lost to compression, it can never be recovered except by going back to the original source.16

And so it was with the EEBO digitization process. Most EEB microfilms were scanned to digital (TIFF) image files at 400 PPI (pixels per inch) using a 1-bit color depth, resulting in images composed of black and white with no grayscale shades in between. This is far less resolution than would typically be used for digital preservation files. The same format was used by EEBO for both master and access files.17 The scans were taken from second-generation EEB negatives so as to protect the master microfilms from wear during digitization. 

The master microfilms themselves were produced under differing conditions, resulting in variable quality. UMI initially captured page images on 35 mm silver halide/acetate base film, but in the 1980s switched to more preservation-friendly polyester base film. Bound books were filmed two pages at a time in an open book format, with the gutter showing as a shadow between them. Early image quality in many instances was variable due to equipment deficiencies and/or operator skill or judgment, resulting in improper exposure and poor framing. With 1-bit color, any grayscale gradations in the original pages were lost, resulting in absence of detail, mottled letter forms, and distorted contrast. Operators used their own discretion to skip more detailed images, and systematically excluded the text of front matter, end papers, and notes handwritten by a reader. They often cropped the open-book image to fit the film area, resulting in loss of marginalia information.18

Consistent ordering of the microfilms also suffered from the somewhat ad-hoc nature of the production process. The earliest books were captured as one book per film, but Power soon began capturing 20 to 30 books per film, resulting in adjustments to identification of the materials. Power intended to order films by date and author name, but the availability of books for filming resulted in still more adjustments.19 

Bibliographic data in today’s EEBO presents other challenges for scholars seeking consistent information. As earlier mentioned, EEBO has a relationship with ESTC, such that EEBO draws directly from the bibliographic data in ESTC’s database. But EEBO has removed certain types of ESTC data, amended others, and added microfilm details, and there is no item-level data synchronization between the two databases, likely resulting in ongoing discrepancies between them.20 

Yet another layer of the EEBO story adds to the picture of both its promise and limitations. The Text Creation Partnership (TCP), a partnership between the University of Michigan Library, Oxford University, the Council on Library and Information Resources, and ProQuest initiated in 2000, produced full-text transcriptions for EEBO materials. The work proceeded in two phases: Phase I converted 25,368 EEBO texts selected from authors mentioned in the New Cambridge Bibliography of English Literature. TCP Phase II was intended to transcribe all remaining English, Welsh, and Gaelic language works in EEBO, and produced some 35,000 transcripts before funding shortfalls slowed the project in 2020.21 All transcripts from both phases are encoded in searchable SGML/XML, and are now freely accessible online.22

Full-text search of historically-valuable visual materials is a kind of holy grail, but despite advances in optical character recognition, comes at the cost of human labor. This is emphatically the case with the EEBO project, since many (perhaps most) of the text images are especially difficult for machines to accurately transcribe. As a result, the TCP has relied on outsourced human labor for transcribing the texts from digital scans. Nardi located an internal TCP training website, no longer accessible online, warning that the transcripts “have been created by non-expert staff, so they should not be used as authoritative editions in themselves.23 

Finally, given the importance of EEBO’s analog and digital assets, it is important to raise the question of their preservation. I have discussed the process of converting the original print materials in the EEB collection to microfilm, and from microfilm to digital images and transcripts. Paper materials from centuries past are fragile, but of all the subsequent materials used by EEBO, paper is the most stable if cared for properly. The original microfilm used by Power was not preservation quality, and UMI’s filming methods were far from optimal. The digitization process suffered from other flaws, and the use of the TIFF format at less than high resolution for the master digital image files limits their representational value. The bibliographic data also reflects inconsistencies in planning and cross-referencing. 

Preservation of information resources in the digital age is challenging in the best of circumstances, especially where it is of vital important to maintain the chain of provenance from disparate collections and aging analog sources. At this point we must simply trust that ProQuest will preserve all the elements that make EEBO the valued resource it has become.

Conclusion

In this article I have reviewed the origins, creation, and evolution of Early English Books Online with a single objective: to tell the story of how EEBO came to be what it is today. Each step along the way from actual early English books, to microfilmed images of pages of the books, to digitized microfilmed images of the pages, to text transcripts of the digitized microfilmed images of the pages, to multiple databases of all the above, represents both great promise for research on English literature and history, and great loss of potential for more complete and cohesive research resources.

This is not to minimize the value of EEBO; it has become a unique and tremendously valuable scholarly resource despite the flaws of its creation. We could easily overlook the details of its imperfections, and simply celebrate it for preserving what it could, as best it could, given the technical and human challenges. I would just offer that if we had to create all the resources and features of EEBO from the original print materials, assuming they were still available, we could learn something important from this history of EEBO. Most importantly for scholars, awareness of the landscape through which EEBO has travelled, from its analog origins to the digital present, can inform better research strategies, and help them avoid getting lost in the woods, and never seeing the trees for the forest.

References

1 Froehlich, Heather. “Early English Books Online.” The Papers of the Bibliographical Society of America 115, no. 1 (February 12, 2021): 114–17. https://doi.org/10.1086/712791.

2 Power, Eugene B. “Eugene B. Power Papers.” Accessed April 4, 2021. https://quod.lib.umich.edu/cgi/f/findaid/findaid-idx?cc=bhlead;c=bhlead;idno=umich-bhl-852048;didno=umich-bhl-852048;view=text.

3 Gadd, Ian. “The Use and Misuse of Early English Books Online.” Literature Compass 6, no. 3 (2009): 680–92. https://doi.org/10.1111/j.1741-4113.2009.00632.x.

4 Martin, Shawn. “EEBO, Microfilm and Umberto Eco: Historical Lessons and Future Directions for Building Electronic Collections.” Microform & Imaging Review 36, no. 4 (Fall 2007): 159–64. https://doi.org/10.1515/MFIR.2007.159.

5 Mak, Bonnie. “Archaeology of a Digitization.” Journal of the Association for Information Science and Technology 65, no. 8 (2014): 1515–26. https://doi.org/10.1002/asi.23061.

6 Gadd, Ian. “The Use and Misuse of Early English Books Online.” Literature Compass 6, no. 3 (2009): 680–92. https://doi.org/10.1111/j.1741-4113.2009.00632.x.

7 Gadd, ibid.

8 The British Library. “Thomason Tracts.” The British Library. Accessed April 3, 2021. https://www.bl.uk/collection-guides/thomason-tracts.

9 Martin, ibid.

10 The British Library. “English Short Title Catalogue.” The British Library. The British Library. Accessed April 4, 2021. https://www.bl.uk/projects/english-short-title-catalogue.

11 Gadd, ibid.

12 British Library, ibid.

13 Kichuk, Diana. “Metamorphosis: Remediation in Early English Books Online (EEBO).” Literary and Linguistic Computing 22, no. 3 (September 1, 2007): 291–303. https://doi.org/10.1093/llc/fqm018.

14 Gale/Cengage. “Eighteenth Century Collections Online:” Eighteenth Century Collections Online. Accessed April 4, 2021. https://www.gale.com/primary-sources/eighteenth-century-collections-online.

15 Eccles, Gregory W., and George Bullen. Catalogue of Books in the Library of the British Museum Printed in England, Scotland, and Ireland, and of Books in English Printed Abroad, to the Year 1640. 3 vols. London, 1884. https://catalog.hathitrust.org/Record/100909721.

16 Kichuk, ibid.

17 This practice was changed in 2012, when additional scans were done at a resolution supporting grayscale images. See: Mak, ibid.

18 Kichuk, ibid.

19 Mak, Ibid.

20 Gadd, ibid.

21 Text Creation Partnership. “Early English Books Online (EEBO) TCP – Text Creation Partnership.” Accessed April 4, 2021. https://textcreationpartnership.org/tcp-texts/eebo-tcp-early-english-books-online/.

22 TCP data can be searched from two different online portals. The University of Michigan Library has maintained an early version at https://quod.lib.umich.edu/e/eebogroup/. TCP data is also included in ProQuest’s EEBO portal at https://www.proquest.com/eebo. Access to the ProQuest EEBO site requires authentication through an institutional ProQuest/EEBO subscriber, where as the Michigan TCP site is freely accessible to the public.

23 Nardi, ibid.