Coder Social home page Coder Social logo

gosom / google-maps-scraper Goto Github PK

View Code? Open in Web Editor NEW
616.0 7.0 72.0 11.6 MB

scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place

License: MIT License

Go 91.84% Dockerfile 1.07% Makefile 2.10% PLpgSQL 5.00%
golang google-maps-scraping web-scraper web-scraping distributed-scraper distributed-scraping google-maps

google-maps-scraper's Introduction

Google maps scraper

build Go Report Card

Google maps scraper

A command line google maps scraper build using

scrapemate web crawling framework.

You can use this repository either as is, or you can use it's code as a base and customize it to your needs

Update Added email extraction from business website support

Try it

touch results.csv && docker run -v $PWD/example-queries.txt:/example-queries -v $PWD/results.csv:/results.csv gosom/google-maps-scraper -depth 1 -input /example-queries -results /results.csv -exit-on-inactivity 3m

file results.csv will contain the parsed results.

If you want emails use additionally the -email parameter

🌟 Support the Project!

If you find this tool useful, consider giving it a star on GitHub. Feel free to check out the Sponsor button on this repository to see how you can further support the development of this project. Your support helps ensure continued improvement and maintenance.

Features

  • Extracts many data points from google maps
  • Exports the data to CSV, JSON or PostgreSQL
  • Perfomance about 55 urls per minute (-depth 1 -c 8)
  • Extendable to write your own exporter
  • Dockerized for easy run in multiple platforms
  • Scalable in multiple machines
  • Optionally extracts emails from the website of the business

Notes on email extraction

By defaul email extraction is disabled.

If you enable email extraction (see quickstart) then the scraper will visit the website of the business (if exists) and it will try to extract the emails from the page.

For the moment it only checks only one page of the website (the one that is registered in Gmaps). At some point, it will be added support to try to extract from other pages like about, contact, impressum etc.

Keep in mind that enabling email extraction results to larger processing time, since more pages are scraped.

Extracted Data Points

input_id
link
title
category
address
open_hours
popular_times
website
phone
plus_code
review_count
review_rating
reviews_per_rating
latitude
longitude
cid
status
descriptions
reviews_link
thumbnail
timezone
price_range
data_id
images
reservations
order_online
menu
owner
complete_address
about
user_reviews
emails

Note: email is empty by default (see Usage)

Note: Input id is an ID that you can define per query. By default its a UUID In order to define it you can have an input file like:

Matsuhisa Athens #!#MyIDentifier

Quickstart

Using docker:

touch results.csv && docker run -v $PWD/example-queries.txt:/example-queries -v $PWD/results.csv:/results.csv gosom/google-maps-scraper -depth 1 -input /example-queries -results /results.csv -exit-on-inactivity 3m

file results.csv will contain the parsed results.

If you want emails use additionally the -email parameter

On your host

(tested only on Ubuntu 22.04)

git clone https://github.com/gosom/google-maps-scraper.git
cd google-maps-scraper
go mod download
go build
./google-maps-scraper -input example-queries.txt -results restaurants-in-cyprus.csv -exit-on-inactivity 3m

Be a little bit patient. In the first run it downloads required libraries.

The results are written when they arrive in the results file you specified

If you want emails use additionally the -email parameter

Command line options

try ./google-maps-scraper -h to see the command line options available:

  -c int
        sets the concurrency. By default it is set to half of the number of CPUs (default 8)
  -cache string
        sets the cache directory (no effect at the moment) (default "cache")
  -debug
        Use this to perform a headfull crawl (it will open a browser window) [only when using without docker]
  -depth int
        is how much you allow the scraper to scroll in the search results. Experiment with that value (default 10)
  -dsn string
        Use this if you want to use a database provider
  -email
        Use this to extract emails from the websites
  -exit-on-inactivity duration
        program exits after this duration of inactivity(example value '5m')
  -input string
        is the path to the file where the queries are stored (one query per line). By default it reads from stdin (default "stdin")
  -json
        Use this to produce a json file instead of csv (not available when using db)
  -lang string
        is the languate code to use for google (the hl urlparam).Default is en . For example use de for German or el for Greek (default "en")
  -produce
        produce seed jobs only (only valid with dsn)
  -results string
        is the path to the file where the results will be written (default "stdout")

Using Database Provider (postgreSQL)

For running in your local machine:

docker-compose -f docker-compose.dev.yaml up -d

The above starts a PostgreSQL contains and creates the required tables

to access db:

psql -h localhost -U postgres -d postgres

Password is postgres

Then from your host run:

go run main.go -dsn "postgres://postgres:postgres@localhost:5432/postgres" -produce -input example-queries.txt --lang el

(configure your queries and the desired language)

This will populate the table gmaps_jobs .

you may run the scraper using:

go run main.go -c 2 -depth 1 -dsn "postgres://postgres:postgres@localhost:5432/postgres"

If you have a database server and several machines you can start multiple instances of the scraper as above.

Kubernetes

You may run the scraper in a kubernetes cluster. This helps to scale it easier.

Assuming you have a kubernetes cluster and a database that is accessible from the cluster:

  1. First populate the database as shown above
  2. Create a deployment file scraper.deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: google-maps-scraper
spec:
  selector:
    matchLabels:
      app: google-maps-scraper
  replicas: {NUM_OF_REPLICAS}
  template:
    metadata:
      labels:
        app: google-maps-scraper
    spec:
      containers:
      - name: google-maps-scraper
        image: gosom/google-maps-scraper:v0.9.3
        imagePullPolicy: IfNotPresent
        args: ["-c", "1", "-depth", "10", "-dsn", "postgres://{DBUSER}:{DBPASSWD@DBHOST}:{DBPORT}/{DBNAME}", "-lang", "{LANGUAGE_CODE}"]

Please replace the values or the command args accordingly

Note: Keep in mind that because the application starts a headless browser it requires CPU and memory. Use an appropriate kubernetes cluster

Perfomance

Expected speed with concurrency of 8 and depth 1 is 55 jobs/per minute. Each search is 1 job + the number or results it contains.

Based on the above: if we have 1000 keywords to search with each contains 16 results => 1000 * 10 = 16000 jobs.

We expect this to take about 10000/55 ~ 291 minutes ~ 5 hours

If you want to scrape many keywords then it's better to use the Database Provider in combination with Kubernetes for convenience and start multipe scrapers in more than 1 machines.

References

For more instruction you may also read the following links

Licence

This code is licenced under the MIT Licence

Contributing

Please open an ISSUE or make a Pull Request

Thank you for considering support for the project. Every bit of assistance helps maintain momentum and enhances the scraper’s capabilities!

Notes

Please use this scraper responsibly

banner is generated using OpenAI's DALE

Sponsors

google-maps-scraper's People

Contributors

arceushui avatar doublelayer avatar gosom avatar iamsad5566 avatar vivianmauer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

google-maps-scraper's Issues

Playwright issue

root@v2202309155727239413:~/code/google-maps-scraper# go build
# github.com/gosom/scrapemate/adapters/fetchers/jshttp
vendor/github.com/gosom/scrapemate/adapters/fetchers/jshttp/jshttp.go:126:25: undefined: playwright.BrowserNewContextOptionsViewport
# github.com/gosom/google-maps-scraper/gmaps
gmaps/job.go:96:16: page.WaitForNavigation undefined (type playwright.Page has no field or method WaitForNavigation)
gmaps/job.go:96:45: undefined: playwright.PageWaitForNavigationOptions
gmaps/place.go:80:16: page.WaitForNavigation undefined (type playwright.Page has no field or method WaitForNavigation)
gmaps/place.go:80:45: undefined: playwright.PageWaitForNavigationOptions
root@v2202309155727239413:~/code/google-maps-scraper# 

I executed the commands from this section but get the above error. The error is on my debian server with go version 1.21.1. I have the same go version on my local machine and it worked there. I tried cleaning the cache wit these commands but it did not help
go clean -cache go clean -modcache

Image export labels

Is there a possible way to remove the titles of images and leave only the links in the csv.

Processes never end

I've added this code to launch from my frontend


func worker(category, city, userResult, sec string, w http.ResponseWriter) {
if err := run(category, city, userResult+"/"+sec+".json"); err != nil {
		fmt.Println("Oops:")
		os.Stderr.WriteString(err.Error() + "\n")
	} else {
		fmt.Println("Else!")
		fmt.Fprintf(w, "event: close\n\n")
		w.(http.Flusher).Flush()
		fmt.Println("Closed")
		// panic("error")
		//saveSearchToFireStore(userResult+"/"+sec+".json", email, documentName)
	}
}

http.HandleFunc("/api/google-search", func(w http.ResponseWriter, r *http.Request) {

        //code to get parameters
	worker(category, city, userResult, sec, w,)
}

Everything is working well but for some reasons, I have 5 subprocesses that never end when the search is done.

Screenshot 2024-03-30 at 10 36 24

Any idea how I can end those processes ?

o

numOfJobsCompleted":23,"numOfJobsFailed":25

Query:
Shisha bars in London

Running basic command, is this because of google page layout?

Its exceeding 5000ms and timing out 50% of each request

{"level":"info","component":"scrapemate","numOfJobsCompleted":13,"numOfJobsFailed":7,"lastActivityAt":"2024-04-02T10:43:21.913079263Z","speed":"6.50 jobs/min","time":"2024-04-02T10:43:46.856171969Z","message":"scrapemate stats"} {"level":"info","component":"scrapemate","job":"Job{ID: fcf935d7-83ad-4c51-bdf8-0e6857f98cd1, Method: GET, URL: https://www.google.com/maps/place/Cavalli+lounge/data=!4m7!3m6!1s0x48760517c9548bab:0xa81bb11018bf2868!8m2!3d51.4978355!4d-0.1650291!16s%2Fg%2F11t7sfbsfj!19sChIJq4tUyRcFdkgRaCi_GBCxG6g?authuser=0&hl=en&rclk=1, UrlParams: map[hl:en]}","error":"Timeout 5000.00ms exceeded.","status":"failed","duration":36361.429128,"time":"2024-04-02T10:43:51.615019055Z","message":"job finished"} {"level":"error","component":"scrapemate","error":"Timeout 5000.00ms exceeded.","time":"2024-04-02T10:43:51.615039471Z","message":"error while processing job"} {"level":"info","component":"scrapemate","numOfJobsCompleted":13,"numOfJobsFailed":8,"lastActivityAt":"2024-04-02T10:43:51.615045471Z","speed":"4.33 jobs/min","time":"2024-04-02T10:44:46.913444177Z","message":"scrapemate stats"}

Change in places layout

The tool returns no result.
Reason: CSS selector that selects the places is not valid anymore since Google changed the layout.

Proposal:
create a json file to read the CSS selectors from there so can be dynamically changed

Could someone add Claimed Business?

It would be helpful if we could see if a business has been claimed or not. I want to see if they have set up a Google My Business.

Is there a way to add this?

Also, is it possible to search the business websites and map comments to find the business owner's name?

Thanks, in advance!

The program is taking a very long time to scrape a single query.

Hello there,

Program consuming a lot of time while simply checking a single query even if i set -depth as 1. any suggestion or help is much appreciated. I'm using VsCode to execute this locally on my machine.

Example - Shopping Malls in thane, India
terminal command : .\google-maps-scraper -depth 1 -input example-queries.txt results test_Scrapping.csv -exit-on-inactivity 3m

it took 2.30 hrs and goes on...... no results are been saved then i shut it off T-T

Thank you

How to get the Feautred Image

Hello Georgios,
Great Work especially the Kuberentes Integration.
Also, my question is how did you get the featured image for google-maps-scraping topic?
Screenshot (529)

Thanks,
Chetan

Problems connecting script with AWS RDS

Running this repo with a psql server on AWS RDS hangs after about 5 jobs have been completed. Even before it hangs, the process is significantly slower (much more than expected) than compared to doing it with a psql server on localhost.

While I can work around this limitation, it would be nice if the code could directly export to the remotely hosted database server.
========

Update: Original errors was due to a simple mistake in usage.
When working with a database, the -email flag should only be used when executing the jobs already populated in the gmaps_jobs table. Having the flag on when creating the jobs results in errors.

The correct usage is:

#Add the jobs to the queue in the database table: gmaps_jobs
go run main.go \
    -dsn $DSN \
    -produce \
    -input example-queries.txt \
    -lang en

#execute the jobs in the queue
go run main.go \
    -c 3  \
    -depth 3 \
    -dsn $DSN \
    -email

Everything below this is now irrelevant to the issue.


When trying to use this repo in conjuction with AWS RDS to host a PostgreSQL server, I encounter some errors.

To setup the RDS database, the "gmaps_jobs" table was made using the create_tables.up.sql script,
and I manually made the "results" table with 2 columns:

  • id : integer : primary_key & not_null
  • data : jsonb : not_null
    ~~

Running the following code to queue the jobs. It fills the gmaps_jobs table as expected.
export DSN="postgres://postgres:postgres@[aws-endpoint]:5432/postgres" \

#Add the jobs to the queue in the database table: gmaps_jobs
go run main.go \
    -dsn $DSN \
    -produce \
    -input example-queries.txt \
    -email

However when running the 2nd part,

#execute the jobs in the queue
go run main.go \
    -c 3  \
    -depth 3 \
    -dsn $DSN

there are a lot of lines in the logging which state:

{"level":"error","component":"scrapemate","error":"invalid job type: while pushing jobs","time":"2024-02-12T01:05:50.189031649Z","message":"error while finishing job"}

Then the script exits with one of two errors: Either
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x8625de]

OR

ERROR: null value in column "id" of relation "results" violates not-null constraint (SQLSTATE 23502)

(as a 3rd case, sometimes the script just hangs after one of the above "invalid job type" errors)

Do you have any suggestions as to what is causing this?


========================
Edit: When running the code locally and using a psql server on localhost, the code successfully completes, but the logs show a LOT of the gmaps jobs failing
{"level":"info","component":"scrapemate","numOfJobsCompleted":84,"numOfJobsFailed":75,"lastActivityAt":"2024-02-12T04:23:45.97806321Z","speed":"28.00 jobs/min","time":"2024-02-12T04:23:49.053938604Z","message":"scrapemate stats"}

However, if the code is run in a docker container and outputs the results to .csv then all of the jobs successfully run without any failing.

High memory usage when scraping lot of queries

Hi,
Big thanks for the tool. It is wonderful and do the job well but i have a problem about memory usage.

I am using scraper with
./google-maps-scraper -input keyword.txt -results results.csv -exit-on-inactivity 3m -c 1 -depth 14

keyword.txt have like ~400-500 queries after running 4hr+ basically program consumes all of the ram (16gb) available then crashes.
Both tested windows 10 and xubuntu 22.04 results are same except in linux it is run bit longer due being lightweight OS.

I just made simple script that divide keywords multiple files and then run them synchronously one at a time trough bash and sleeps like 30sec between commands for a workaround.

So the question is memory consumption about memory leak or is it totally normal for that query size?

Could not install driver & Browser

While trying in rockylinux . I am facing this issues
[root@vps-3d569d google-maps-scraper]# ./google-maps-scraper -input example-queries.txt -results restaurants-in-cyprus.csv -exit-on-inactivity 3m
2023/09/03 18:48:52 Downloading driver to /root/.cache/ms-playwright-go/1.20.0-beta-1647057403000
2023/09/03 18:49:10 Downloaded driver successfully
2023/09/03 18:49:10 Downloading browsers...
/root/.cache/ms-playwright-go/1.20.0-beta-1647057403000/package/lib/cli/cli.js:263
require(playwrightTestPackagePath).addTestCommand(_commander.program);
^

TypeError: require(...).addTestCommand is not a function
at Object. (/root/.cache/ms-playwright-go/1.20.0-beta-1647057403000/package/lib/cli/cli.js:263:40)
at Module._compile (node:internal/modules/cjs/loader:1101:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
at Module.load (node:internal/modules/cjs/loader:981:32)
at Function.Module._load (node:internal/modules/cjs/loader:822:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
at node:internal/main/run_main_module:17:47
could not install driver: could not install browsers: could not install browsers: exit status 1

How can i solve this issue? OS rockylinux 8

Windows user

Dude, it's not exactly an issue. But yes, a doubt. I'm a bit new to Docker and SQL, can you tell me how I can run the program on Windows?

Job results not saving to postgres table

Following the readme instructions:

  1. Stand up docker container
  2. Successfully populate gmaps_jobs with go run main.go -dsn "postgres://postgres:postgres@localhost:5432/postgres" -produce -input example-queries.txt --lang el
  3. Run the process which successfully finishes: go run main.go -c 2 -depth 1 -dsn "postgres://postgres:postgres@localhost:5432/postgres"
  4. "results" table is empty upon completion

High memory usage

Hi, your code is very good and works well, but when I use it the memory usage goes up quite a lot and when the scraping is finished the memory doesn't come back to normal. Are you sure you closed the goroutines when they were finished?

How to retrieve user_reviews?

User reviews are always null with the default command:

touch results.csv && docker run -v $PWD/example-queries.txt:/example-queries -v $PWD/results.csv:/results.csv gosom/google-maps-scraper -depth 1 -input /example-queries -results /results.csv -exit-on-inactivity 3m

Am I doing something wrong? Is it possible to collect all user reviews of a place?

Thanks!

Modified EntryFromGoQuery function but nothing changed.

Hi, I'm trying to get a 2nd address from Google Maps (a localized address from each language), the problem is no matter what I added to func EntryFromGoQuery, nothing changes, let alone the 2nd address.

for example even when I add these lines in func EntryFromGoQuery in gmaps/entry.go

entry.Address_first = "first"
entry.Address_second = "second"

(database already modified)
my results.csv never changes except the column names and an empty new columns I added, addr will always be in the first address column (the original column).
Or even when I'm trying to print something using this function, It just doesn't come out.

Does this mean the script don't use this function at all ? or should I modify somewhere else too to make it work.

Edited : This is irrelevant to the above question but It seems like I need to use
el := doc.Find(`button[data-item-id="laddress"]`) instead of el := doc.Find(`button[data-item-id="address"]`) but it doesn't seem to fix the problem anyway because I don't think it can find laddress at all. I can only get one address and no laddress here and I will have to specify -lang to get a localized address of my choice, are there any ways I can get both address and laddress here. πŸ€”

/

My query:

shisha in South London, United Kingdom
touch results.csv && docker run -v $PWD/example-queries.txt:/example-queries -v $PWD/results.csv:/results.csv gosom/google-maps-scraper -email -depth 1 -input /example-queries -results /results.csv -exit-on-inactivity 3m

The error

{"level":"info","component":"scrapemate","job":"Job{ID: , Method: GET, URL: https://usplace.top/cafe-havana-2, UrlParams: map[]}","error":"could not send message to server: Timeout 30000ms exceeded.\n=========================== logs ===========================\nnavigating to \"https://usplace.top/cafe-havana-2\", waiting until \"networkidle\"\n============================================================","status":"failed","duration":30484.928501,"time":"2024-04-02T10:43:17.475261678Z","message":"job finished"} {"level":"info","component":"scrapemate","jobid":"","url":"https://links.thewaterfront.london/DpGK","time":"2024-04-02T10:43:21.913020346Z","message":"Processing email job"} {"level":"info","component":"scrapemate","job":"Job{ID: , Method: GET, URL: https://links.thewaterfront.london/DpGK, UrlParams: map[]}","error":"could not send message to server: Protocol error (Network.getResponseBody): No resource with given identifier found","status":"failed","duration":11496.736547,"time":"2024-04-02T10:43:21.913076305Z","message":"job finished"} {"level":"info","component":"scrapemate","numOfJobsCompleted":13,"numOfJobsFailed":7,"lastActivityAt":"2024-04-02T10:43:21.913079263Z","speed":"6.50 jobs/min","time":"2024-04-02T10:43:46.856171969Z","message":"scrapemate stats"} {"level":"info","component":"scrapemate","job":"Job{ID: fcf935d7-83ad-4c51-bdf8-0e6857f98cd1, Method: GET, URL: https://www.google.com/maps/place/Cavalli+lounge/data=!4m7!3m6!1s0x48760517c9548bab:0xa81bb11018bf2868!8m2!3d51.4978355!4d-0.1650291!16s%2Fg%2F11t7sfbsfj!19sChIJq4tUyRcFdkgRaCi_GBCxG6g?authuser=0&hl=en&rclk=1, UrlParams: map[hl:en]}","error":"Timeout 5000.00ms exceeded.","status":"failed","duration":36361.429128,"time":"2024-04-02T10:43:51.615019055Z","message":"job finished"} {"level":"error","component":"scrapemate","error":"Timeout 5000.00ms exceeded.","time":"2024-04-02T10:43:51.615039471Z","message":"error while processing job"} {"level":"info","component":"scrapemate","numOfJobsCompleted":13,"numOfJobsFailed":8,"lastActivityAt":"2024-04-02T10:43:51.615045471Z","speed":"4.33 jobs/min","time":"2024-04-02T10:44:46.913444177Z","message":"scrapemate stats"} {"level":"info","component":"scrapemate","numOfJobsCompleted":13,"numOfJobsFailed":8,"lastActivityAt":"2024-04-02T10:43:51.615045471Z","speed":"3.25 jobs/min","time":"2024-04-02T10:45:46.862487386Z","message":"scrapemate stats"} {"level":"info","component":"scrapemate","numOfJobsCompleted":13,"numOfJobsFailed":8,"lastActivityAt":"2024-04-02T10:43:51.615045471Z","speed":"2.60 jobs/min","time":"2024-04-02T10:46:46.819352344Z","message":"scrapemate stats"} {"level":"info","component":"scrapemate","numOfJobsCompleted":13,"numOfJobsFailed":8,"lastActivityAt":"2024-04-02T10:43:51.615045471Z","speed":"2.17 jobs/min","time":"2024-04-02T10:47:46.877837761Z","message":"scrapemate stats"} {"level":"info","component":"scrapemate","error":"inactivity timeout: 2024-04-02T10:43:51Z","time":"2024-04-02T10:47:46.878477969Z","message":"exiting because of inactivity"} {"level":"info","component":"scrapemate","time":"2024-04-02T10:47:46.880626302Z","message":"scrapemate exited"}

Appears that max 21 results(numOfJobsCompleted":21) can be searched. Is it possible to specify -xx max results ?

{"level":"info","component":"scrapemate","numOfJobsCompleted":21,"numOfJobsFailed":0,"lastActivityAt":"2024-03-31T11:56:00.67744242Z","speed":"5.25 jobs/min","time":"2024-03-31T11:57:27.569097049Z","message":"scrapemate stats"}
{"level":"info","component":"scrapemate","numOfJobsCompleted":21,"numOfJobsFailed":0,"lastActivityAt":"2024-03-31T11:56:00.67744242Z","speed":"4.20 jobs/min","time":"2024-03-31T11:58:27.563894076Z","message":"scrapemate stats"}
{"level":"info","component":"scrapemate","numOfJobsCompleted":21,"numOfJobsFailed":0,"lastActivityAt":"2024-03-31T11:56:00.67744242Z","speed":"3.50 jobs/min","time":"2024-03-31T11:59:27.563869733Z","message":"scrapemate stats"}

Descriptions not extracted or saved during scraping

Wonder if there is a flag to include GMP descriptions during scraping? During testing, I see there is a column for description in results.csv that is exported but there are no descriptions being saved.

Docker does not seem to be working

Hi,
When running the docker quickstart, it gives the following error:
touch results.csv && docker run -v $PWD/example-queries.txt:/example-queries -v $PWD/results.csv:/results.csv gosom/google-maps-scraper -depth 1 -input /example-queries -results /results.csv -exit-on-inactivity 3m
At line:1 char:19

  • touch results.csv && docker run -v $PWD/example-queries.txt:/example- ...
  •               ~~
    

The token '&&' is not a valid statement separator in this version.
+ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
+ FullyQualifiedErrorId : InvalidEndOfLine

What can be done to solve this?
Seems like a great program which would have a great contribution.

"title is empty" error

I've tried running both the local and database methods and for both I'm getting this for every single google maps entry (running on Windows 10):
{"level":"error","component":"scrapemate","error":"title is empty","time":"2023-07-22T19:24:05.8364593Z","message":"error while processing job"}

It's successfully outputting the Google Maps url for each location but obviously no information is saved because of this error.

switch database to non-json

Sorry if I missed something or this is a dumb question.

I setup docker according to readme and it works, both with local csv and json output and with json-database output.
however, I don't understand how to correctly switch the database to non-json/separate columns.

when I tried to switch to non-json, the scraping will just stop working and freeze without an error message after the third pagejob.
I tried that multiple times and it always freezes after 3 page jobs, I'm assuming you are waiting for 3 results before writing it to database?

I found and used the sql files, but those didn't help either. A few things I noticed:

json-up.sql drops the non-json fields, but json.down doesn't drop the json field, so I assumed that json not null blocks the script from populating the columns, but deleting the json column didn't help either.

I also noticed that the columns that are created by jsondown.sql are far less than the actual datapoints that are extracted, so I feel like this sql file is either faulty or I'm missing something important?

I would have assumed that the database should contain a column for all datapoints that are otherwise extracted as columns in CSV?

Hi i got some issue in here

=========================================","status":"failed","duration":150831.139,"time":"2023-04-30T18:14:18.227687Z","message":"job finished"}
{"level":"error","component":"scrapemate","error":"could not send message to server: Timeout 30000ms exceeded.\n=========================== logs ===========================\nwaiting for selector "button[aria-label='Reject all']"\n============================================================","time":"2023-04-30T18:14:18.228161Z","message":"error while processing job"}

Results are stored in json format

I followed the steps mentioned in this section but the results are stored in json format in one column which makes it hard to work with.
image

I saw in your blog in this section that you created a table with the columns you want. I tried that but when I do that the scraper stops after about three listings. No results are inserted in the database. I guess the code breaks here because the table has an unexpected format.

Download positions

Hi! would be great if this project, with every request return the lat/long too in new columns!

Thx!

Is Directory?

Hi bro i hv installd docker windows and when i run the command it says

touch results.csv && docker run -v $PWD/example-queries.txt:/example-queries -v $PWD/results.csv:/results.csv gosom/google-maps-scraper -depth 1 -input /example-queries -results /results.csv -exit-on-inactivity 3m

open /results.csv: is a directory

Query on google does not match scheduled jobs

I'm currently testing the accuracy of the scraper and a simple google business result from the query "hair salon in aalborg, denmark" returns +200 results but only 77 jobs are scheduled to be scraped.

Is there any inner workings I should be wary of and possibly be improved on to help you in this matter?

Wonderful tool btw.

Stalled after some time

Hi, I tested this with some samples, no idea why, but the program does not finish, some process seems to be working more than 7h without finish....

I'm uploading the query file
requests.txt

Thx!

Input Query in Output CSV

Hi There,

thanks so much for this gem.

I am wondering, is there an easy way to get the input query in the output CSV as additional column?

Have a great day ahead

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.