The Library of Congress is sponsoring the historic newspaper database. It consists of scanned and OCRed newspapers from 1789 through 1924 and bibliographic data from 1690 to present.
Just reading newspapers from 100 years ago is interesting in itself. But what else should be done with this digital archive? Beyond advanced search options: proximity terms, phrase searching, and Boolean logic the dataset features an API.
An API (application programming interface) is a set of protocols that enables programs to easily talk to other programs. In this instance, a programmer can quickly pull a JSON (Java Script Object Notation) dataset from the archive for manipulation.
Why is this important? Open archives and structured datasets invites new and creative research opportunities and insights. For example, a researcher can track how newspaper word usage evolved over this time frame. Or a researcher could create tests for tracking the prevalence of “fake news” in the past.
What questions do you want to ask 125 years of newspapers?
Edit: As exciting as this dataset is, it is also a bit disappointing. The archive stops in 1924 – no doubt because of copyrighted works not being in the public domain. The wealth of research that could be done, and the data that could be created and linked is being stifled by copyright law. New laws could enable the information revolution to accomplish much more.