Waivly
Posts
Harvard and Google to Release 1 Million Public-Domain Books as AI Training Dataset

Harvard and Google to Release 1 Million Public-Domain Books as AI Training Dataset

Featuring works from legendary authors like Dickens, Dante, and Shakespeare

Waivly Crew
December 12, 2024

Harvard University is set to release a groundbreaking dataset containing approximately one million public-domain books, featuring works from legendary authors like Dickens, Dante, and Shakespeare. These texts, which are no longer under copyright due to their age, span a wide range of genres and languages, offering a rich resource for researchers and developers alike.

Though the exact release date and method remain unclear, the dataset is built on materials from Google’s vast book-scanning project, Google Books. As a result, Google will play a significant role in distributing this valuable collection, making it accessible to a global audience.

Harvard and Google to Release 1 Million Public-Domain Books as AI Training Dataset

Featuring works from legendary authors like Dickens, Dante, and Shakespeare

Subscribe to keep reading

Reply