Coder Social home page Coder Social logo

picorana / amazon-scraper Goto Github PK

View Code? Open in Web Editor NEW
32.0 6.0 4.0 51 KB

amazon-scraper is a command line application to collect reviews and questions/answers from amazon products.

License: The Unlicense

Python 100.00%
amazon scraper questions answers reviews download

amazon-scraper's Introduction

Amazon Scraper

Documentation Status Build codecov Requirements Status

Amazon-scraper is a command line application to collect reviews and questions/answers from amazon products.

Read the documentation here: amazon-scraper on readthedocs

Table of contents

Installation

via pip

~ TODO

cloning this repository

$ git clone https://github.com/picorana/amazon-scraper.git   
$ cd amazon-scraper
$ pip install -r requirements.txt
$ python setup.py install

Usage

Run amazon-scraper via command line by running

$ amazon-scraper [asin]

asin is a unique identifier for a product on amazon. You can find it in the url:
A query to https://www.amazon.com/gp/product/B01H2E0J5M would look like this:

$ amazon-scraper B01H2E0J5M

You can also insert multiple asins:

$ amazon-scraper B01H2E0J5M B01GYLZD8C B0736R3W1F

or load them from file:

$ amazon-scraper --file asins.txt

the file needs to have each asin on one line, like this:

B01H2E0J5M
B01GYLZD8C
B0736R3W1F

Output

amazon-scraper downloads pages, reviews, questions and answers.
It will save its output in folders:

pages will contain the main pages of the product, useful for extracting more info about the product.
You can disable this function by using the option --save-pages

results will contain the reviews, organized in json files.
You can disable scraping the reviews by using the option --no-reviews

questions will contain the questions and answers, organized in json files.
You can disable scraping the questions by using the option --no-questions

Options

positional arguments:
  asin                  Amazon asin(s) to be scraped

optional arguments:
  -h, --help            show this help message and exit
  --file FILE, -f FILE  Specify path to list of asins
  --save-main-pages, -p
                        Saves the main pages scraped
  --verbose, -v         Logging verbosity level
  --no-reviews          Do not scrape reviews
  --no-questions        Do not scrape questions

References

instagram-scraper has been used as a reference for the structure of the program.
this blogpost has been very useful in understanding the issues of building a scraper.

amazon-scraper's People

Contributors

picorana avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

amazon-scraper's Issues

inconsistent use of tabs and spaces in indentation

I get an inconsistent use of tabs and spaces error in line 103. I tried to convert tabs to spaces etc but nothing helped. any suggestions?

if not os.path.exists('./questions'): os.makedirs('./questions')

AttributeError: 'NoneType' object has no attribute 'find'

Traceback (most recent call last):
File "/usr/local/bin/amazon-scraper", line 11, in
load_entry_point('amazon-scraper==0.0.1', 'console_scripts', 'amazon-scraper')()
File "/usr/local/lib/python2.7/site-packages/amazon_scraper-0.0.1-py2.7.egg/amazon_scraper/app.py", line 463, in main
scraper.scrape()
File "/usr/local/lib/python2.7/site-packages/amazon_scraper-0.0.1-py2.7.egg/amazon_scraper/app.py", line 73, in scrape
main_reviews_url, review_pages_number = self.retrieve_page(asin)
File "/usr/local/lib/python2.7/site-packages/amazon_scraper-0.0.1-py2.7.egg/amazon_scraper/app.py", line 324, in retrieve_page
"div", { "id" : "reviews-medley-footer" }
AttributeError: 'NoneType' object has no attribute 'find'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.