Beagle Scraper

Building the largest open-source Ecommerce scraper with Python and BeautifulSoup4

Usage

No installation or setup required

Download the source code into a folder
Create a urls.txt file with product category pages to be scraped like this Amazon page
Run the command

$ python start_scraper.py

Output

Beagler Scraper will export all data into JSON format into a sub-folder

Current supported e-commerce stores

Amazon.com
BestBuy.com
HomeDepot.com

Beagle Scraper tutorial - how to use and run the scraper

https://www.bestproxyproviders.com/blog/beagle-scraper-tutorial-how-to-scrape-e-commerce-websites-and-modify-the-scraper/

Getting Started

Beagle Scraper requires a machine with Python 2.7 and BeautifulSoup4

Install BeautifoulSoup4

$ pip install beautifulsoup4

Prerequisites - extra Python packages required

The following packages are not included in the default Python 2.7 install and require installation

tldextract

$ sudo pip install tldextract

selenium

$ pip install selenium

If another package is missing run the command

$ pip install [missing package name]

Using proxies to scrape

Beagle Scraper support external proxies at the moment, but proxychains can be used to send requests through different proxies

After installing proxychains, run this command to make the scraper use proxies

$ proxychains python start_scraper.py

Test Beagle Scraper

Here's a short test for Beagle Scraper

Download Beagle Scraper
Create a urls.txt file and insert the following product category pages (each link on a different line)

Run Beagle Scraper

$ python start_scraper.py

Example output for the above scraped urls:

amazon_dd_mm_yy.json
bestbuy_dd_mm_yy.json

Built With

Beautiful Soup - Scraping library
Python 2.7 - Dependency Management

How to contribute

All you have to do is to create a function scraper link amazon_scraper() from beagle_scraper.py and submit it here.

Here is more info on how the scraper function is created

Things to consider:

HTML wrapper and class/id for each product listed on the page
The product details HTML tags and classes
Pagination setup

Authors

Chris Roark - Initial work - ChrisRoark

License

GPL-3.0 license

averroes / beagle_scraper Goto Github PK

beagle_scraper's Introduction

Beagle Scraper

Usage

Output

Current supported e-commerce stores

Beagle Scraper tutorial - how to use and run the scraper

Getting Started

Prerequisites - extra Python packages required

Using proxies to scrape

Test Beagle Scraper

Built With

How to contribute

Authors

License

beagle_scraper's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent