- Waivly
- Posts
- Harvard and Google to Release 1 Million Public-Domain Books as AI Training Dataset
Harvard and Google to Release 1 Million Public-Domain Books as AI Training Dataset
Featuring works from legendary authors like Dickens, Dante, and Shakespeare
Harvard University is set to release a groundbreaking dataset containing approximately one million public-domain books, featuring works from legendary authors like Dickens, Dante, and Shakespeare. These texts, which are no longer under copyright due to their age, span a wide range of genres and languages, offering a rich resource for researchers and developers alike.
Though the exact release date and method remain unclear, the dataset is built on materials from Google’s vast book-scanning project, Google Books. As a result, Google will play a significant role in distributing this valuable collection, making it accessible to a global audience.
Reply