Coder Social home page Coder Social logo

googlemaps-scraper's Introduction

Google Maps Scraper

Scraper of Google Maps reviews. The code allows to extract the most recent reviews starting from the url of a specific Point Of Interest (POI) in Google Maps. An additional extension helps to monitor and incrementally store the reviews in a MongoDB instance.

Installation

Follow these steps to use the scraper:

  • Download Chromedrive from here.

  • Install Python packages from requirements file, either using pip, conda or virtualenv:

      conda create --name scraping python=3.6 --file requirements.txt
    

Note: Python >= 3.6 is required.

Basic Usage

The scraper.py script needs two main parameters as input:

  • --i: input file name, containing a list of urls that point to Google Maps place reviews (default: urls.txt)
  • --N: number of reviews to retrieve, starting from the most recent (default: 100)

Example:

python scraper.py --N 50

generates a csv file containing last 50 reviews of places present in urls.txt

In current implementation, the CSV file is handled as an external function, so if you want to change path and/or name of output file, you need to modify that function.

Additionally, other parameters can be provided:

  • --place: boolean value that allows to scrape POI metadata instead of reviews (default: false)
  • --debug: boolean value that allows to run the browser using the graphical interface (default: false)
  • --source: boolean value that allows to store source URL as additional field in CSV (default: false)
  • --sort-by: string value among most_relevant, newest, highest_rating or lowest_rating (default: newest), developed by @quaesito and that allows to change sorting behavior of reviews

For a basic description of logic and approach about this software development, have a look at the Medium post

Monitoring functionality

The monitor.py script can be used to have an incremental scraper and override the limitation about the number of reviews that can be retrieved. The only additional requirement is to install MongoDB on your laptop: you can find a detailed guide on the official site

The script takes two input:

  • --i: same as monitor.py script
  • --from-date: string date in the format YYYY-MM-DD, gives the minimum date that the scraper tries to obtain

The main idea is to periodically run the script to obtain latest reviews: the scraper stores them in MongoDB up to get either the latest review of previous run or the day indicated in the input parameter.

Take a look to this Medium post to have more details about the idea behind this feature.

Notes

Url must be provided as expected, you can check the example file urls.txt to have an idea of what is a correct url. If you want to generate the correct url:

  1. Go to Google Maps and look for a specific place;
  2. Click on the number of reviews in the parenthesis;
  3. Save the url that is generated from previous interaction.

googlemaps-scraper's People

Contributors

gaspa93 avatar thetaxmatterz avatar samirarman avatar dependabot[bot] avatar gtesk avatar ryuuzake avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.