Tuesday, August 12 • 9:00am - 5:00pm
Using Open-Source OCR Tools for Digitization Projects #1516 [This course is FULL]

Sign up or log in to save this to your schedule and see who's attending!

NOTE: This course takes place at an off site location. Please plan accordingly.

Advance / Regular

SAA Members: $189 / $249
Employees of SAA Member Institutions: $219 / $279
Nonmembers: $249 / $299

Course Description

Archivists, museum professionals, and individual scholars should not be intimidated by technological or financial concerns when considering an OCR project! Learn how to find and use freely available tools for implementing your own, successful OCR projects.

Does this scenario sound familiar? Your aim is to digitize source documents to preserve the text in another format and make page images available electronically. But then you find that the text is locked in that document and therefore isn’t available for indexing and searching until the page images have undergone an additional OCR process or a costly hand-transcription process. Your instructor (who draws on his experience as a member of the Initiative for Digital Humanities, Media, and Culture team that is working on a two-year, Mellon-funded grant to OCR 45 million pages of 15th- to 18th-century printed and digitized English documents) will demystify the OCR process so that you’re on your way to achieving your goal.

Upon completion of this workshop you’ll be able to:

  • Define the basic principles and vocabulary of OCR;

  • Select various open-source tools that are essential to the OCR process; and

  • Describe some of these tools and the Tesseract OCR engine based on hands-on use.


Who should attend? Archivist practitioners, archivist managers, digital curators, IT professionals, and librarians.

What should you know? Attendees should understand the basics of digitization of documents, metadata, and the organization of data. Attendees who bring their own page images will get a chance to begin OCR-ing them in the workshop.

Attendance is limited to 35.

avatar for Matt Christy

Matt Christy

Associate Director for Technology and eResources, Baylor Health Sciences Library
We just migrated to Alma/Primo last May and we're still trying to figure out how a lot of this stuff works.

Tuesday August 12, 2014 9:00am - 5:00pm
The Historical Society of Washington, D.C. 801 K St NW, Washington, DC 20001

Attendees (0)