This is a modification of sgraaf's repository to replicate the Toronto BookCorpus dataset, to collect datasets by genre. The entire scraped corpus is in the repo, under data/. It can be shared freely.
The mapping of genre-id to genre is as follows:
horror: 883
fantasy: 1206
romance: 1235
science fiction: 1213
adventure: 892
sports: 1126
western: 871
humor & comedy: 882
children's: 61
urban: 873
thriller & suspense: 874
religious: 877
YA/teen: 1018
mystery & detective: 879