HathiTrust: A Digital Library Revolution Takes Flight

From the University of California UCnet

The phrase “closed until further notice due to COVID-19” has become all too familiar. And, while we have started to grow accustomed to losing access to many resources that typically define our community existence, there’s one that’s particularly crucial to student and faculty researchers: libraries. For some, it may be easy to write off libraries as “nice-to-have.” But for scholars, they are essential. And as library doors began to shutter throughout California and much of the world, the potential impact on the academic community was profound.

Thankfully, the University of California has been preparing for this moment for decades. In 2008, the UC Libraries co-founded HathiTrust, and started contributing scanned copies of books and journals to the new organization. Based at the University of Michigan (U-M), HathiTrust is a large-scale repository of digital content collaboratively created by academic and research institutions. As researchers lost access to vital hard-copy materials, it initiated an Emergency Temporary Access Service (ETAS) to give UC researchers critical access to more than 13 million digital volumes. This revolution has been immediately impactful — and a profound advancement in sharing digital content.

“For Berkeley faculty, students, and staff, [the ETAS] opens up a trove of materials,” says Salwa Ismail, associate university librarian for digital initiatives and information technology, UC Berkeley. “Our shelves are closed, but as long as your screens are open, you’ll have access to most of our resources.”

Preparing for a digital future

UC established the California Digital Library (CDL) in 1997, as it became clear that emerging technologies would continue to transform how digital information was published and accessed. In the years since its founding, one of the CDL’s initiatives has been partnering with campus libraries systemwide to digitize the university’s combined holdings, which together comprise the largest university library system in the world.

In 2006, the CDL initiated an impactful partnership with the Google Books Library Project, which at the time was seeking to create a comprehensive, publicly accessible digital archive of the world’s books. For years, library staff throughout UC have packed books and journals and shipped them to Google Books to be scanned, page by page. Through their participation, UC libraries receive digital copies of every volume from Google, and contribute them to HathiTrust.

As Laine Farley, interim executive director of the CDL in 2008, said at the partnership launch, “The UC libraries have an unparalleled reputation for innovation in digital library development and inter-institutional collaboration. Participation in HathiTrust continues this tradition and will enable UC to provide its students and scholars with access to one of the most significant digital collections ever assembled.”

Today, with 4.5 million digital volumes and counting included in HathiTrust, UC’s content contributions are second only to co-founder U-M.

The value of preservation

Partnerships like HathiTrust and Google Books are essential to universities’ digitization efforts because they significantly minimize the resources each must expend to preserve individual collections. For example, once U-M has scanned a book that is in UC’s physical collection, UC doesn’t, in theory, need to digitize the same book for its HathiTrust digital collection. This overlap between university libraries is in the millions. In addition to saving time and money, these coordinated efforts minimize potential damage to fragile older volumes, particularly those that are too delicate to remain in circulation.

The benefits of digitizing rare materials have become abundantly clear during the ETAS. “A graduate student was interested in a rare 20th-century newspaper held in our Special Collections and Archives, that is not available freely online. Examining this particular title is key to his research, and I was able to point him to the ETAS access,” says Heather Smedberg, reference and instruction coordinator for UC San Diego Library’s Special Collections & Archives. “The scanned copy was from the University of Virginia, which spared us the need to rescan our fragile copy as part of the Google Books project in the first place. Having this kind of access outside of emergency situations would be a valuable preservation service.”

Science, technology, engineering and math (STEM) scholars are more likely to be able to access materials digitally, as STEM research content is typically recent. But scholars of the humanities — particularly of social sciences and the arts — often need to access materials that are decades or centuries old. For these disciplines, digitization is particularly vital, as scholars must sometimes travel long distances to access a specific book or journal.

“I’ve had a lot of success finding lists of books for literature faculty working on publications and graduate students who are studying for qualifying exams,” says Nina Mamikunian, literature librarian at UC San Diego Library. “Most of these books are older and aren’t available electronically on our usual platforms. Without ETAS, researchers would not be able to access them. The graduate students, in particular, are incredibly grateful for this resource.”

A question of ownership

Google Books’ original intent was to digitize the entirety of the world’s publications. While this concept was thrilling in terms of research and access, it was problematic from the perspective of the publishing industry. Google faced several copyright lawsuits before adapting to its current use policy, which allows unfettered access to materials in the public domain and only basic search results for copyrighted volumes. Texts under copyright must be accessed traditionally, either by purchasing them or borrowing them from a library or institution that has purchased them.

In normal times, access through HathiTrust is similar: It serves as a much-needed aggregation of digitized print holdings from many member libraries’ collections, but it does not ordinarily provide full-text access for materials under copyright. Here’s why: When UC Riverside, for example, contributes a botany textbook to HathiTrust, the physical book returns to its library after scanning. If one student could check out the hard-copy textbook while three others simultaneously viewed it on HathiTrust, UC Riverside would gain four access points while only having paid for one. For this reason, while HathiTrust archives full text of copyrighted volumes, copyright constraints have prevented researchers from reading them.

During the ETAS, HathiTrust has made selected copyrighted materials within its collections accessible to researchers at contributing institutions that are experiencing an unexpected or involuntary service disruption. ETAS access mirrors the access university’s researchers would have to hard-copy volumes if libraries were open. For example, UC Riverside’s single physical copy of a botany textbook grants digital ETAS access to one student at a time. If UC libraries hold four copies of a book, four students can access it through HathiTrust simultaneously. Since UC’s libraries are part of a single statewide system, so is their HathiTrust access.

Legal experts believe that this controlled digital lending model, where library users can collectively only access as many copies of a volume as their library owns, is permissible under copyright law as a fair use.

Crucial access at a critical time

The ETAS may sound fairly straightforward, but it is, in fact, revolutionary. Typically, if UC researchers at one campus need a volume at another campus, they can request the book through interlibrary loan. Although this process is surprisingly expedient — materials are typically delivered in 24 to 48 hours — under HathiTrust ETAS, researchers can digitally reserve books from other UCs within seconds. The volume of accessible materials is tremendous: The ETAS allows UC researchers to access more than 13 million digital volumes, including HaithiTrust’s 6.7 million volumes in the public domain.

Kathryn Stine, senior product manager, digitization and digital content, CDL, helps to coordinate the UC libraries in their interactions with the HathiTrust ETAS. “This is very much an effort to provide continuity of access to materials for research, learning and teaching at a time when faculty, students and staff need remote access to online content,” she says.

“Through our long-standing partnership with HathiTrust, we have created the conditions for the UC community to access an amazing amount of information during this crisis,” says Günter Waibel, executive director, California Digital Library and member, HathiTrust Board of Governors. “Through our leadership and content contributions, we are also supporting 150 other US research libraries who are just as ecstatic as we are to be able to provide this emergency access to their communities.”

“This is a great example of how the CDL has coordinated a critical service that impacts users throughout the system,” says Roger Smith, interim associate university librarian for collection services at the UC San Diego Library. “The ETAS is quite a proof of concept in regards to the power of aggregate digital holdings that could be circulated via controlled digital lending. It has highlighted that there is a continuing interest and proven value in refining the quality and completeness of our holdings in the HathiTrust corpus.”

The HathiTrust ETAS is, as its name clearly articulates, an emergency, temporary service. But knowledge of how its unprecedented access has advanced research will endure long after the crisis.

“While the COVID-19 crisis has brought much grief and heartbreak, it also brought some things we may now want to find a way to keep, like the much-improved air quality in major metropolitan areas of California,” says Waibel. “As far as access to digital content goes, we’re exploring creative — and legal — ways of how we may be able to extend some of the compelling features of this service into a post-pandemic world. Stay tuned!”

Learn more about how the California Digital Library supports UC.