We explore applications of image analysis and machine learning in digital libraries of historic materials. We’re especially interested in what we might learn from the millions of digital images that librarians, archivists, and others are creating as they digitize the cultural record. We’re intrigued by the questions that machine learning approaches might help to surface in these collections and about our professional practices—and also by the questions our collections and professional practices might help to surface about machine learning.


The big "why" for our work is that information—its access, its content, its gaps—structures power and affects individuals' and communities' lives in profound ways. Here's how we put it in a report we wrote for the Library of Congress:

Domains considering implementing machine learning must engage deeply and critically with the technology, what it does, and what it means. For cultural heritage digital libraries, now is a critical moment to grapple with epistemologies of machine learning and the knowledge it structures, shapes, and appears to codify. Some elements of these epistemological conversations may transcend domains and applications, but these conversations also must be rooted in the specificities of the cultural heritage sector. In particular, libraries must grapple with their historical foundations and practices and with the potential consequences of these practices for machine learning. Previous and ongoing collecting and description practices, for example, were and are colonialist, racist, hetero- and gender- normative, and supremacist in other structural and systemic ways. These understandings are the foundation on which training and validation data will be created and assembled; they will become reinscribed as statements of truth, even as we elsewhere champion the potential of computational approaches to uncover hidden histories, identities, and perspectives in collections. To engage machine learning in cultural heritage must mean confronting these histories, committing to the hard work of acknowledgment and rectification, and not simply reproducing them and giving them a whole new scale of power.



Data generated by/for our project, accessible through appropriate data repositories designed for long-term access. See our project page on the Open Science Framework.


Elizabeth (Liz) Lorang

Associate Professor and Associate Dean for Research & Learning in the University Libraries at the University of Nebraska-Lincoln

#digital libraries #academic libraries #cultural heritage organizations #computational approaches #knowledge systems #information justice

Leen-Kiat Soh

Charles Bessey Professor of Computer Science & Engineering in the School of Computing at the University of Nebraska-Lincoln

#multi-agent systems to support human users #computer science education #image processing #machine learning

Yi Liu

Doctoral candidate in the School of Computing at the University of Nebraska-Lincoln

#computer vision #machine learning #deep learning

Chulwoo Pack

Doctoral candidate in the School of Computing at the University of Nebraska-Lincoln

#document image analysis #machine learning #image processing


Many individuals have contributed to our work in myriad ways over the years. They include (in alphabetical order): Lauren Algee (collaborating partner), Andrew Barrow (core team member), Sarah Berkowitz (core team member), Paul Conway (advisory board member), Ryan Cordell (advisory board member), Maanas Varma Datla (core team member), Jody DeRidder (advisory board member), Mary Ellen Ducey (advisory board member), Adam Farquhar (advisory board member), Meghan Ferriter (collaborating partner),Emily Gore (advisory board member), Patricia Hswe (advisory board member), Natalie Houston (advisory board member), Victoria Van Hyning (collaborating partner), Eileen Jakeway (collaborating partner), Kyle Janvrin (core team member), Spencer Kulwicki (core team member), Joseph Lunde (core team member), Worthy Martin (collaborating partner), Meredith McGill (advisory board member), Jaime Mears (collaborating partner), Andrew Michael (core team member), Bethany Nowviskie (advisory board member), John O'Brien (collaborating partner), Abbey Potter (collaborating partner), Delaram Rahimighazikalayeh (core team member), Ayla Stein (advisory board member), Grace Thomas (core team member), John Unsworth (advisory board member), and Tong Wang (collaborating partner).


Analysis & Interpretation

Augmentation-Based Pseudo-Groundtruth Generation for Deep Learning in Historical Document Segmentation for Greater Levels of Archival Description and Access

journal article

Visual Domain Knowledge-based Multimodal Zoning for Textual Region Localization in Noisy Historical Document Images

journal article

Investigating Coupling Pre-processing with Shallow and Deep Convolutional Neural Networks in Document Image Classification

journal article

Digital Libraries, Intelligent Data Analytics, and Augmented Description: A Demonstration Project

final report delivered to the Library of Congress

Final Presentation to the Library of Congress on Digital Libraries, Intelligent Data Analytics, and Augmented Description

presentation slides

Virtual Wrap-Up Presentation: Digital Libraries, Intelligent Data Analytics, and Augmented Description

presentation slides

Document Images and Machine Learning: A Collaboratory between the Library of Congress and the Image Analysis for Archival Discovery (Aida) Lab at the University of Nebraska, Lincoln, NE

presentation slides

Work-in-Progress Reports

submitted to the Library of Congress

Machine Learning in Research Libraries: A Snapshot of Projects, Opportunities, and Challenges

panel presentation

Application of the Image Analysis for Archival Discovery Team’s First- Generation Methods and Software to the Burney Collection of British Newspapers


Using Chronicling America’s Images to Explore Digitized Historic Newspapers & Imagine Alternative Futures

conference presentation

Patterns, Collaboration, Practice: Algorithms as Editing for Historic Periodicals

conference presentation

Increasing Our Vision for 21st-Century Digital Libraries

keynote talk

Developing an Image-Based Classifier for Detecting Poetic Content in Historic Newspaper Collection

journal article