Coder Social home page Coder Social logo

book-dataset's Introduction

Book Cover Dataset

This dataset contains 207,572 books from the Amazon.com, Inc. marketplace.

Challenges

Results and related papers

Task 1: Classification

A. Book Cover Image to Genre (BookCover30)

The purpose of this task is to classify the books by the cover image. The BookCover30 dataset contains 57,000 book cover images divided into 30 classes. The training set and test set is split into 90% - 10% respectively.

Technical details

Task 2: Data Mining

Data Mining (Book32)

This task is to explore the entire book database. There are 207,572 books in 32 classes. This dataset contains book cover images, title, author, and category for each respective book.

Technical details

Use

Full Images

Due to size constraints, the full images aren't available in this repository. However, we provide label files with URLs to the images hosted on Amazon. Note, the fidelity of the images cannot be guarenteed. A script to download them can be found in scripts.

(224 x 224 x 3) Images

Resized images for the BookCover30 dataset are available in this download.

Download (657 MB)

Citation

Paper on arXiv

B. K. Iwana, S. T. Raza Rizvi, S. Ahmed, A. Dengel, and S. Uchida, "Judging a Book by its Cover," arXiv preprint arXiv:1610.09204 (2016).

@article{iwana2016judging,
  title={Judging a Book by its Cover},
  author={Iwana, Brian Kenji and Raza Rizvi, Syed Tahseen and Ahmed, Sheraz and Dengel, Andreas and Uchida, Seiichi},
  journal={arXiv preprint arXiv:1610.09204},
  year={2016}
}

Contact

[email protected]

Disclaimer

All book cover images are hosted by and copyright Amazon.com, Inc. The the use of the book cover images is fair use for academic purposes.

book-dataset's People

Contributors

biwana avatar poppingtonic avatar snhryt avatar yakigac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

book-dataset's Issues

Need updated datasets

Hey i need updated datasets every day i want to run a cron that would help me to get all the data of books on end of the day.
I see you have data so there must be a way to do that.Can you tell me how we can do that.

AttributeError: 'Namespace' object has no attribute 'output_dirpath'

When I try to run the download images.py I keep getting an attribute error. I'm new to the Argparse library and haven't been able to figure out how to fix this

[Download images into "Desktop/book_coverart"]
0%| | 8/207573 [00:03<11:06:21, 5.19it/s]joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker
r = call_item()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 272, in call
return self.fn(*self.args, **self.kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 567, in call
return self.func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/joblib/parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/joblib/parallel.py", line 225, in
for func, args, kwargs in self.items]
File "download_images.py", line 39, in download_image
inner_output_dirpath = os.path.join(args.output_dirpath, category)
AttributeError: 'Namespace' object has no attribute 'output_dirpath'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "download_images.py", line 52, in
Parallel(n_jobs=-1)(delayed(download_image)(i) for i in trange(len(csv)))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/joblib/parallel.py", line 934, in call
self.retrieve()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/joblib/parallel.py", line 833, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 521, in wrap_future_result
return future.result(timeout=timeout)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
AttributeError: 'Namespace' object has no attribute 'output_dirpath

Labels are NOT sorted

Label names are almost sorted.
But Christian Books & Bibles is not located correct place.
It must be placed between Children's Books and Comics & Graphic Novels.

Connectivity issue causes script to crash

If a ConnectionResetError exception is thrown while an image is being download, the script will crash. There should be some sort of auto-recovery mechanism that attempts to recover the script instead of the existing logic which results in lost progress (since the script will have to be re-run again if an crash occurs).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.