Releases: tedunderwood/noveltmmeta
NovelTM Metadata, Publication State
Metadata about 210,266 volumes identified as English-language fiction in HathiTrust Digital Library, 1700-2009. The metadata is divided into seven lists; several of the short lists have been manually corrected. This release of metadata for "NovelTM Datasets for English-Language Fiction" was timed for publication of the accompanying article. It represents missing data in the date field better than the first release did, and also removes duplicate rows in volumemeta.tsv and weighted_subset.tsv, guided by a suggestion from Matt Wilkens.
NovelTM Metadata with Missing Dates
In previous releases, the values 0 and 2100 were used to mark missing values in inferreddate and latestcomp columns. To avoid confusion, they have been replaced with blanks that will read as NaN.
NovelTM metadata without confusing categories from MARC 008
For final release, we filtered out some genre categories that we had derived from the fixed-length MARC leader in field 008. These were often conflicting, and seemed likely to create confusion.
NovelTM metadata (first release)
v1.0 changing title