Coder Social home page Coder Social logo

screddy1313 / amazon-product-images-downloader Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 4.0 797 KB

Given a product name, the python program downloads all the images. This includes pagenation also.

License: MIT License

Python 40.60% Jupyter Notebook 59.40%
amazon-automation selenium-python machine-learning-dataset image-downloader-python

amazon-product-images-downloader's Introduction

amazon-product-images-downloader

Given a product name, the python program downloads all the images. This includes pagenation also.

Description

Often times, we need to download all the images of products. These images can be useful to gather data for Machine Learning / Deep learning projects.

This program takes 3 inputs from the user :

  • Product name : This product name is entered in amazon search box and products are retrieved.
  • Number of items : Optional, default 100. These many number of product images to be downloaded.
  • Number of pages : optional, default 10. These many number of pages will be traversed to download the product images.

All the downloaded images will be stored in images folder, where name of image is its asin-id (unique amazon product id). Make sure that images folder exists in working directory.

Libraries Used :

  • Selenium : To automate the amazon search and for pagenation
  • Beautiful Soup 4 : To parse the html content
  • python 3.6
  • requests : to download the image from the url

Install

// Linux
python3 -V // Ensure 3.6+
pip3 -V // Ensure... pip3

pip3 install selenium
pip3 install webdriver_manager
pip3 install requests
pip3 install beautifulsoup4
pip3 install lxml

// Linux only
sudo apt install chromium-chromedriver

Usage :

Make sure all the above mentioned libraries are installed.
python product_images_downloader.py ( look the output images directory to get the idea !!)

Debugging :

If you're a filthy degenerate hiding behind a proxy and the amazon captcha shows up, run the following

$ python3

from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://www.amazon.in')
// Solve the captcha
exit()
// Close the browser

That should stave off the bots for a few extra runs.

Things to do :

  • Eliminate the images of sponsored products.
  • Extracting all the details of product (name, price, ratings) and storing in csv.

amazon-product-images-downloader's People

Contributors

autistictemptationforge avatar screddy1313 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.