Old scanned books dataset with groundtruth. The groundtruth was built with Project Gutenberg ebooks. All the .tiff pages were converted from project Internet Archive's books (PDFs). They were selected among the following books:
-Betrayed Armenia, de Diana Agabeg Apcar
-The Boy Apprenticed to an Enchanter, de Padraic Colum
-The Child of the Moat, de Stoughton Holborn
-The Corset and the Crinoline, de W.B.L
-Engraving of Lions, Tigers, Panthers, Leopards, Dogs, &C., de Thomas Landseer
-Half-Hours with Highwaymen, de Charles G. Harper
-Historical Sketches of Colonial Florida, de Richard L. Campbell
-Horton Genealogy, de Geo. F. Horton
-The Lusitania's Last Voyage, de Charles E. Lauriat
-Seat Weaving, de L. Day Perry
The dataset is presented in several resolutions: 300dpi,500dpi,1000dpi. Also there are severa sets of 300dpi binarized with different methods.
Feel free to use and study the sets contained here :)