What’s a “book” in Early English Books Online?

Recently I have been employed by the Visualising English Print project, where one of the things we are doing is looking at improving the machine-readability of the TCP texts. My colleague Deidre has already released a plain-text VARDed version of the TCP corpus, but it is our hope to further improve the machine-readability of these texts.

One of the issues that came up in modernising and using the TCP texts has to do with non-English print. It has been previously documented that there are several non-English languages in EEBO – including Latin, Greek, Dutch, French, Welsh, German, Hebrew, Turkish and Algonquin. Our primary issue is if there is a transcription that is not in English in the corpus, it will be very difficult for an English-language text parser or word vector model to account for this material.

So our solution has been to isolate the texts which are printed in a non-English language, either monolingually (e.g. a book in Latin) or a bi- or tri-lingual text (e.g. Dutch/English book, with a Latin foreword). Looking at EEBO-the-books is a helpful way to identify languages in print, as there are all sorts of printed cues to suggest linguistic variation, such as different fonts or italics to set a different language off from the primary language. It also means I get a chance to look at many of these non-English texts as they were printed and transcribed initially.

Three years ago, I wrote a blog post about some Welsh language material that I found in EEBOTCP Phase I. In the intervening time I still have not learned Welsh (though I am endlessly fascinated by it), still get lots of questions and clicks to this site related to Early Modern Welsh (hello Early Modern Welsh fans), and I have since learned quite a lot more about how texts were chosen to be included in EEBO (it involves the STC; Kichuk 2007 is an excellent read on this topic to the previously uninitiated). So while that previous post asked “What makes a text in EEBO an English text”, this post will ask “what makes a text in EEBO a book?”

In general, I think we can agree that in order to be considered a book or booklet or pamphlet, a printed object has to have several pages. These pages can either created through folding one broadside sheet, or it will have collection of these (called gatherings). It may or may not have a cover, but it would be one of several sizes (quarto, folio, duodecimo, etc). To this end, Sarah Werner has an excellent exercise on how to fold a broadside paper into a gathering which builds the basis for many, but probably not all, early books. Here is an example of a broadside that has clearly been folded up; it’s been unfolded for digitization.

folded broadsheeet A17754           TCPID A17754

So it has been folded in a style that suggests it could be read like a book, but it is not necessarily a book in the sense that there is a distinct sense of each individual page and that some of the verso/recto pages would be rendered unreadable unless they had been cut, etc.

In order to be available for digitization from the original EEBO microfilms, a text needed to be included in a short title catalogue. The British Library English Short Title Catalogue describes itself as

a comprehensive, international union catalogue listing early books, serials, newspapers and selected ephemera printed before 1801. It contains catalogue entries for items issued in Britain, Ireland, overseas territories under British colonial rule, and the United States. Also included is material printed elsewhere which contains significant text in English, Welsh, Irish or Gaelic, as well as any book falsely claiming to have been printed in Britain or its territories.

I select the British Library ESTC here because it covers several short title catalogues (Wing and Pollard & Redgrave are both included) and it’s my go-to short title catalogue database. Including “ephemera” is important, because it allows any number objects to be considered as items of early print, even if they’re not really ‘books’ per se.

Such as this newspaper (TCPID A85603)…

newspaper A85603

Or this this effigy, in Latin, printed on 1 broadside (TCP id A01919); click to see full-sizeeffigy A01919

Or this proclamation, also printed on 1 broadside (TCPID A09573)

proclamation A09573

Or this sheet of paper, listing locations in Wales (Wales! Again!) (TCPID A14651); click to see full-size

Screen Shot 2016-07-12 at 9.00.48 pm


Or this acrostic (TCPID A96959); click to see full-size


Interestingly, these are all listed as “one page” in the Jisc Historical books metadata, though they are perhaps more accurately “one sheet”. While there’s no definitive definition of “English” in Early English Books Online, it’s becoming increasingly clear to me that there’s no definitive definition of “book” either. And thank god for that, because EEBO is the gift that keeps giving when it comes to Early Modern printed materials.


7 reasons why I think this Hebrew-Latin book from 1683 is really cool

A few years ago I wrote about non-English language printing in EEBO, a post which still gets a fair amount of traffic and a lot of people asking me about Welsh. So when I found a bilingual Latin/Hebrew book in EEBO on Friday night while searching for something else just as I was getting ready to go meet some friends for dinner, I was overjoyed. This this is a book printed in Cambridge, England, in 1683 and contains two languages which are very much not English.

JISC’s EEBO portal lists the title as “Komets leshon ha-koresh ve-ha-limudim = Manipulus linguae sanctae & eruditorum : in quo, quasi, manipulatim, congregantur sequentia, I. index generalis difficilorum vocum Hebraeo-Biblicarum, irregularium, & defectivarum, ad suas proprias radices, & radicum conjugationes, tempora, & personas, &c. reductarum” (R1614 Wing), describing it briefly as a Hebrew grammar (with the first four words in the title transliterated from Hebrew). My years of Hebrew school did not leave me a fluent Hebrew speaker or reader; I have no formal Latin or bibliographic training, but this book is really cool. Here are some reasons why…

This isn’t the title page, but it is the introductory material and you can see that it contains Hebrew, Latinate and Greek characters on the same page:

Screen Shot 2015-12-12 at 3.45.54

For starters, this is a bilingual grammar and index to the Old Testament, serving in some ways as a precursor to my digital concordances. But it also is fascinating because it involves several different typefaces representing several different languages, so someone in 1683 had either created a typeface for Hebrew or had access to a Hebrew typeface to print this book. Furthermore, Hebrew has a script form and a block letter form; the block letters are often used in printing whereas script is much more common elsewhere. Torahs are hand-copied onto vellum (even today!), so it is plausible someone may have had to transform each scripted character into block letters for this.

Hebrew is read from right to left, whereas Latin is read from left to right, so this book had to be very carefully typeset to put these two languages back to back. It also has a vowel system which is optional in print, but they are usually found under the consonants. Torahs often do not use the vowel system so the inclusion of them here (look for the lines, dots and small T’s) is interesting and an extra complication for typesetting.

Screen Shot 2015-12-12 at 3.46.25

The catchwords at the bottom of the page are printed in Hebrew here, but the book uses Latinate numbering. And – as my mother pointed out – entries are listed alphabetically in Hebrew (not in Latin).

Screen Shot 2015-12-12 at 3.46.39

It also includes a list of ambiguities, still written in both languages, and still juxtaposed with a left-to-right and right-to-left language.

Screen Shot 2015-12-12 at 3.46.53

So this is already interesting from a printing perspective, but then there are also grammatical notes and commentaries included, with descriptions of how to use this grammar. And still the juxtaposition of both languages on the same line is really fascinating:

Screen Shot 2015-12-12 at 3.47.06 Screen Shot 2015-12-12 at 3.47.21

From the grammatical guide, here  is a table of conjugations in Hebrew, marked with Latin descriptions (active, passive, future, participles, etc): Screen Shot 2015-12-12 at 3.45.25

And finally it ends in a two-column translation of Hebrew text into Latin:

Screen Shot 2015-12-12 at 4.06.43

Download the EEBO scan as a PDF for more.

Ways of Accessing EEBO(TCP)

On October 28, 2015, the Renaissance Society of America sent an email to all members announcing the demise of their previous partnership with ProQuest (now in control of ExLibris too). Their email to all of us, in full:

The RSA Executive Committee regrets to announce that ProQuest has canceled our subscription to the Early English Books Online database (EEBO). The basis for the cancellation is that our members make such heavy use of the subscription, this is reducing ProQuest’s potential revenue from library-based subscriptions. We are the only scholarly society that has a subscription to EEBO, and ProQuest is not willing to add more society-based subscriptions or to continue the RSA subscription. We hoped that our special arrangement, which lasted two years, would open the door to making more such arrangements possible, to serve the needs of students and scholars. But ProQuest has decided for the moment not to include any learned societies as subscribers. Our subscription will end a few days from now, on October 31. We realize this is very late notice, but the RSA staff have been engaged in discussions with ProQuest for some weeks, in the hope of negotiating a renewal. If they change their mind, we will be the first to re-subscribe.

This is truly terrible news, especially for anyone whose institution did not/could not subscribe to the ProQuest interface.

**EDIT 29 Oct 8:05pm**: the RSA confirms that access to EEBO via ProQuest will continue:

We are delighted to convey the following statement from ProQuest:

“We’re sorry for the confusion RSA members have experienced about their ability to access Early English Books Online (EEBO) through RSA. Rest assured that access to EEBO via RSA remains in place. We value the important role scholarly societies play in furthering scholarship and will continue to work with RSA — and others — to ensure access to ProQuest content for members and institutions.”

The RSA subscription to EEBO will not be canceled on October 31, and we look forward to a continued partnership with ProQuest.

Perhaps because the first set of TCP editions of the EEBO texts are now part of the public domain, this is supposed to be sufficient for scholars’ use. Of course, this is not true: the TCP texts are a facsimile of the EEBO images (themselves facsimiles of facsimiles). However inadequate the TCP texts are for someone without an EEBO subscription, I have been collecting a number of links for a number of years about how to access and use EEBO(TCP). Despite overturning this decision, the benefit of having all these resources listed together seems to justify their continued existence here. They are also available on my links page, but in the interest of accessibility, here they are replicated:

1 EEBO(TCP) documentation
Text Creation Partnership
EEBO-TCP documentation
EEBO-TCP Tagging Cheatsheet: Alphabetical list of tags with brief descriptions
Text Creation Partnership Character Entity List
The History of Early English Books Online
Using Early English Books Online

2 Access to EEBO(TCP) full texts (searchable)
Early English Books Online (EEBO): JISC historical books interface (UK, paywall, free access from the British Library Reading Room)
Early English Books Online (EEBO): Chadwyck-Healey interface (outside UK, paywall; your mileage may vary by country)

The Dutch National Library has off-site access, including full EEB (European books), ECCO (18C), TEMPO (pamphlets), for members, 15€/yr. Register online:

EEBO-TCP Texts on Github
UMichigan TCP repository
UMichigan EEBO-TCP full text search*
University of Oxford Text Archive TCP full text search*
* These sites are mirrors of each other
See also 10 things you can do with EEBOTCP

EEBO-TCP Ngram reader, concordancer, & text counts
CQPWeb EEBO-TCP, phase I (and many others)
(Video guide to CQPWeb:
BYU Corpora front end to EEBO-TCP (*not completely full text but will be soon*)

3 Other resources
English Short Title Catalogue (ESTC),
Universal Short Title Catalogue (USTC)
Items in the English Short Title Catalogue (ESTC), via Hathitrust;c=247770968
LUNA, Folger Library Digital Image Collection
Internet Archive Books
The Folger Digital Anthology of Early Modern English Drama

Sarah Werner’s compendium of resources, incl digitised early books
Laura Estill’s Digital Renaissance wiki page covers online book catalogues, digitised facsimiles, early modern playtexts online, print and book history, etc
(see also her very thorough guide to manuscripts online
Claire M. L. Bourne’s Early Modern Plays on Stage & Page resource list:

Large Digital Libraries of Pre-1800 Printed Books in Western Languages
The University of Toronto has a large number of Continental Renaissance text-searchable books online
30+ digitised STC titles at Penn (free to use, from their collection),%20title_sort%20asc

UCSB Broadside Ballads Archive
Broadside Ballads Online

A database of early modern printers & sellers culled from the eMOP source documents
(And their mirror of the ECCO-TCP texts:

Database of Early English Playbooks (DEEP)

How to save and download pdfs from the Chadwyck EEBO Interface

And a crucial read from Laura Mandell and Elizabeth Grumbach on the digital existence of ECCO (Eighteenth Century Collections Online):

this page will update with more resources as they are available. email me with links: heathergfroehlich at gmail dot com // 15 Aug 2016