Coder Social home page Coder Social logo

ndabdulsalaam / web_scrapers Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 120 KB

I have demostrated in this notebook how to scrape Amazon and Wikipedia websites, stored output as csv format being the widely used format, and send notification to user when certain criteria are met.

License: Apache License 2.0

Jupyter Notebook 92.20% Python 7.80%
amazonwebscrape automation beatifulsoup4 python webscraping wikipedia

web_scrapers's Introduction

Data Scraping with Python

Python web scraping is a technique used for gathering data from web pages. This does the typical task of exploring and downloading the pages or grabbing content, and store in a specific data format.

I have demostrated in this notebook how to scrape Amazon and Wikipedia websites, stored output as csv format being the widely used format, and send notification to user when certain criteria are met.

Price Tracker

Amazon scraping script that scrapes Amazon at intervals and sends noification to the email provided when the price the item being tracked is equal to or below the specified price.

Functions in this script

  • check_price: takes in two parameters, a filename and a URL, and scrapes the price and title of a product from the specified URL. It then saves the title, price, current date, and time in a CSV file with the specified filename in a subdirectory called 'output'. If the file does not exist, it creates a new file with a header row and writes the data. If the file already exists, it appends the data to the existing file. Finally, the function returns the price as a floating-point number.

  • send_mail: sends an email notification when the price of a product hits below a certain level. The function takes in five parameters: the recipient email address (to_address), the current price of the product (price), the name of the product (product), the target price (target_price), and the URL of the product (url). Note that this code assumes that the sender has set up a Gmail account and provided the correct credentials and that the recipient's email server does not block the email as spam.

  • track_price: tracks the price of a product on Amazon, and sends a notification email to the user when the price of the product falls below a certain level. The code first imports the time module, and two functions from external Python files: check_price() and send_mail(). The check_price() function checks the price of the product on Amazon and writes the product information (title, price, date, and time) to a CSV file, while the send_mail() function sends an email notification to the user. The code then sets the url, target_price, product, and to_address variables, which are used by the track_price() function. The track_price() function asks the user if they want to use test data or input their own data, and then repeatedly calls the check_price() function every sleep seconds to track the product's price. If the price falls below the target_price, the send_mail() function is called to send a notification email to the user.

run.py

This code runs the track_price function, which tracks the price of a product on Amazon and sends an email notification if the price drops below a certain target price. The script prompts the user to select between using testing parameters or input the URL of the product, the target price, the name of the product, and the email address to send the notification to, manually. It then runs a loop that repeatedly checks the price of the product and sleeps for a certain amount of time before checking again. If the price drops below the target price, it calls the send_mail function to send an email notification. The if name == "main": statement ensures that the track_price function is only run if the script is run as the main module, and not if it is imported into another script.

How to Clone

  • Setup up your credentials: Add to your user environment variables email address and password and name them as MAIL_USER and MAIL_PASS respectively.

  • While inside the folder for this project, copy code below and run in your terminal.

git clone https://github.com/nurudeenabdulsalaam/Web_Scrapers
cd Web_Scrapers/price_tracker/
python run.py

Others

  • Check my WeRateDogs repository to see Twitter scraping in action

  • Notebooks are also provided to facilitate experimental learning

  • Periodic scraping was not used for Wikipedia because it is a static web page.

web_scrapers's People

Contributors

ndabdulsalaam avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.