• Waivly
  • Posts
  • Harvard and Google to Release 1 Million Public-Domain Books as AI Training Dataset

Harvard and Google to Release 1 Million Public-Domain Books as AI Training Dataset

Featuring works from legendary authors like Dickens, Dante, and Shakespeare

Harvard University is set to release a groundbreaking dataset containing approximately one million public-domain books, featuring works from legendary authors like Dickens, Dante, and Shakespeare. These texts, which are no longer under copyright due to their age, span a wide range of genres and languages, offering a rich resource for researchers and developers alike.

Though the exact release date and method remain unclear, the dataset is built on materials from Google’s vast book-scanning project, Google Books. As a result, Google will play a significant role in distributing this valuable collection, making it accessible to a global audience.

Subscribe to keep reading

This content is free, but you must be subscribed to Waivly to continue reading.

I consent to receive newsletters via email. Terms of Use and Privacy Policy.

Already a subscriber?Sign In.Not now

Reply

or to participate.