gaspa93 / googlemaps-scraper Goto Github PK
View Code? Open in Web Editor NEWGoogle Maps reviews scraping
License: GNU General Public License v3.0
Google Maps reviews scraping
License: GNU General Public License v3.0
from requirements.txt:
beautifulsoup4==4.6.0
certifi==2022.12.7
charset-normalizer==2.0.12
colorama==0.4.5
configparser==5.2.0
crayons==0.4.0
idna==3.3
numpy==1.23.0
pandas==1.4.3
pymongo==3.9.0
python-dateutil==2.8.2
pytz==2022.1
requests==2.31.0
selenium==3.14.0
six==1.16.0
termcolor==1.1.0
webdriver-manager==3.5.2
pandas==0.25.2
numpy==1.22.0
there are 2 pandas and numpy versions listed
Hi, I ran this script on my previous computer and it works fine, but when I switched to my new computer, it keeps gives me "failed to click recent button", I checked debug and debug works just as it should, could you give me some idea how to solve this?
Hello,
I am getting an Uncaught RangeError for locations where the reviews are more than 900-1000. This is probably from the while
loop in scrapper.py. Is there a way to resolve this. Thanks in advance.
Best
Hi @gaspa93,
I was trying to use the following URL: https://www.google.com/maps/place/Ellora+Caves/@20.025817,75.1779975,17z/data=!4m7!3m6!1s0x3bdb93bd138ae4bd:0x574c6482cf0b89cf!8m2!3d20.025817!4d75.1779975!9m1!1b1
for scraping N=1000 reviews and sort by = most relevant when I got this error: selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
Full error:
[Review 0]
Traceback (most recent call last):
File "scraper.py", line 63, in
reviews = scraper.get_reviews(n)
File "/home/maunil/Desktop/googlemaps-scraper/googlemaps.py", line 172, in get_reviews
self.__expand_reviews()
File "/home/maunil/Desktop/googlemaps-scraper/googlemaps.py", line 298, in __expand_reviews
l.click()
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 80, in click
self._execute(Command.CLICK_ELEMENT)
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 628, in _execute
return self._parent.execute(command, params)
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
self.error_handler.check_response(response)
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: headless chrome=103.0.5060.114)
I'm unable to scarp the reviews, it showed me this
DevTools listening on ws://127.0.0.1:57417/devtools/browser/4c960584-6cdf-450f-831f-a1175e7d6d6a
[0723/113647.067:INFO:CONSOLE(0)] "Autofocus processing was blocked because a document already has a focused element.", source: https://www.google.com/maps/place/Al+Salaam+Mall/@21.5078941,39.2233532,15z/data=!4m8!3m7!1s0x15c3ce6cdb182a97:0x29f6012ad865f128!8m2!3d21.5078941!4d39.2233532!9m1!1b1!16s%2Fg%2F11bvt4d9_v?entry=ttu (0)
←[36m[Review 0]←[0m
any advice how to solve it, it was work last week
Hi,
If i want to get all reviews from a big places (lets say a mcdo), i only get 900 reviews then google ban us : urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fef9782ea20>: Failed to establish a new connection: [Errno 61] Connection refused
but i have to Ctrl+C your script to get this error
(i think you are trying again, and again, and again... but google still ban you) ahah
Hi dear Mattia @gaspa93. Would you consider creating a library that pulls data on any business from Googlemaps (business name, avg rating, open hours, price range ($) etc.?
__expand_reviews(self):
doesn't seem to expand all the reviews.
pandas
and numpy
are imported but are not in requirements.txt
[Review 0]
Traceback (most recent call last):
File "/Users/satyammishra/Desktop/sentiment_analysis/googlemaps-scraper/scraper.py", line 63, in
reviews = scraper.get_reviews(n)
File "/Users/satyammishra/Desktop/sentiment_analysis/googlemaps-scraper/googlemaps.py", line 168, in get_reviews
self.__scroll()
File "/Users/satyammishra/Desktop/sentiment_analysis/googlemaps-scraper/googlemaps.py", line 278, in __scroll
scrollable_div = self.driver.find_element(By.CSS_SELECTOR, 'div.siAUzd-neVct.section-scrollbox.cYB2Ge-oHo7ed.cYB2Ge-ti6hGc')
File "/Users/satyammishra/opt/anaconda3/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 857, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "/Users/satyammishra/opt/anaconda3/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
self.error_handler.check_response(response)
File "/Users/satyammishra/opt/anaconda3/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div.siAUzd-neVct.section-scrollbox.cYB2Ge-oHo7ed.cYB2Ge-ti6hGc"}
(Session info: headless chrome=103.0.5060.134)
Stacktrace:
0 chromedriver 0x0000000102472ef9 chromedriver + 4480761
1 chromedriver 0x00000001023fe5d3 chromedriver + 4003283
2 chromedriver 0x0000000102091338 chromedriver + 410424
3 chromedriver 0x00000001020c75bd chromedriver + 632253
4 chromedriver 0x00000001020c7841 chromedriver + 632897
5 chromedriver 0x00000001020f97d4 chromedriver + 837588
6 chromedriver 0x00000001020e4a8d chromedriver + 752269
7 chromedriver 0x00000001020f74f1 chromedriver + 828657
8 chromedriver 0x00000001020e4953 chromedriver + 751955
9 chromedriver 0x00000001020bacd5 chromedriver + 580821
10 chromedriver 0x00000001020bbd25 chromedriver + 584997
11 chromedriver 0x000000010244402d chromedriver + 4288557
12 chromedriver 0x00000001024491b3 chromedriver + 4309427
13 chromedriver 0x000000010244e23f chromedriver + 4330047
14 chromedriver 0x0000000102449dfa chromedriver + 4312570
15 chromedriver 0x0000000102422fef chromedriver + 4153327
16 chromedriver 0x0000000102463d78 chromedriver + 4418936
17 chromedriver 0x0000000102463eff chromedriver + 4419327
18 chromedriver 0x000000010247aab5 chromedriver + 4512437
19 libsystem_pthread.dylib 0x00007ff811a524f4 _pthread_start + 1
Can we make __scroll
and __expand_reviews
work parallely in the get_reviews
function to improve performance?
Hello guys and especially Gaspa!
This is just an amazing tool. Im an amateur in programming but I can see how advanced this tool is and manages to do quite a bit. If only there was an update to make it work. I've struggled with it for a few days but unfortunately Im just not at a level where I can fix it.
It would be amazing if someone could update it.
Thank you again!
I followed the readme including installing dependencies to new environment. when I try to run on terminal, i get the following error. I tried tracing it back, not familiar with chormedriver manager.
in the instructions I downloaded chromedriver and placed it in the root dir of the scraper just in case.
(google_maps_scrape) jg@J-MacBook-Pro googlemaps-scraper % python scraper.py --N 50 --i urls_1.txt
Traceback (most recent call last):
File "/Users/folder/googlemaps-scraper/scraper.py", line 43, in
with GoogleMapsScraper(debug=args.debug) as scraper:
File "/Users/folder/googlemaps-scraper/googlemaps.py", line 31, in init
self.driver = self.__get_driver()
File "/Users/folder/googlemaps-scraper/googlemaps.py", line 377, in __get_driver
input_driver = webdriver.Chrome(executable_path=ChromeDriverManager(log_level=0).install(), options=options)
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/chrome.py", line 32, in install
driver_path = self._get_driver_path(self.driver)
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/manager.py", line 23, in _get_driver_path
driver_version = driver.get_version()
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/driver.py", line 41, in get_version
return self.get_latest_release_version()
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/driver.py", line 74, in get_latest_release_version
validate_response(resp)
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/utils.py", line 80, in validate_response
raise ValueError("There is no such driver by url {}".format(resp.url))
ValueError: There is no such driver by url https://chromedriver.storage.googleapis.com/LATEST_RELEASE_118.0.5993
(google_maps_scrape) jg@J-MacBook-Pro googlemaps-scraper %
pip list:
Package Version
appnope 0.1.3
asttokens 2.4.1
backcall 0.2.0
beautifulsoup4 4.6.0
certifi 2022.12.7
charset-normalizer 2.0.12
colorama 0.4.5
comm 0.1.4
configparser 5.2.0
crayons 0.4.0
debugpy 1.8.0
decorator 5.1.1
exceptiongroup 1.1.3
executing 2.0.0
idna 3.3
ipykernel 6.26.0
ipython 8.16.1
jedi 0.19.1
jupyter_client 8.5.0
jupyter_core 5.4.0
matplotlib-inline 0.1.6
nest-asyncio 1.5.8
numpy 1.23.0
packaging 23.2
pandas 1.4.3
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pip 23.3
platformdirs 3.11.0
prompt-toolkit 3.0.39
psutil 5.9.6
ptyprocess 0.7.0
pure-eval 0.2.2
Pygments 2.16.1
pymongo 3.9.0
python-dateutil 2.8.2
python-dotenv 1.0.0
pytz 2022.1
pyzmq 25.1.1
requests 2.31.0
selenium 3.14.0
setuptools 68.0.0
six 1.16.0
stack-data 0.6.3
termcolor 1.1.0
tornado 6.3.3
traitlets 5.12.0
urllib3 2.0.7
wcwidth 0.2.8
webdriver-manager 3.5.2
wheel 0.41.2
Hi @gaspa93, I was attempting to scrape the following URL using the command python scraper.py --N 50 --i urls.txt --debug:
https://www.google.com/maps/place/The+Kutaya/@-8.7131747,115.18668,13z/data=!4m11!3m10!1s0x2dd2441f302d3927:0x7fdd6aa714bc38e1!5m2!4m1!1i2!8m2!3d-8.7389695!4d115.1673844!9m1!1b1!16s%2Fg%2F11cn5x9s4l?entry=ttu
However, the log displays a warning message stating, "Failed to click sorting button" after the script finishes running. Is there any way to fix this issue? Thank you!
I am trying to use your script but when i execute it on my cmd nothing appears in the data folder.
I just added 1 import ´from http import cookies´ because SameSite error appears, and this apparently solves it.
WARNING:
"A cookie associated with a cross-site resource at http://google.com/ was set without the SameSite
attribute. A future release of Chrome will only deliver cookies with cross-site requests if they are set with SameSite=None
and Secure
. You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.", source: https://www.google.it/maps/place/Pantheon/@41.8986108,12.4746842,17z/data=!3m1!4b1!4m7!3m6!1s0x132f604f678640a9:0xcad165fa2036ce2c!8m2!3d41.8986108!4d12.4768729!9m1!1b1 (0)
ERROR file gm-scraper.txt:
2020-03-20 14:22:17,098 - WARNING - Failed to click recent button
2020-03-20 14:22:27,451 - WARNING - Failed to click recent button
2020-03-20 14:22:37,877 - WARNING - Failed to click recent button
If you can help me i will apreciate it because i need to get the data for my end-of-degree project.
Thank you soo much.
Test to reproduce the problem:
python scraper.py --N 50 --i urls.txt --debug
cat gm-scraper.log
2022-04-14 16:22:02,936 - WARNING - Failed to click sorting button
I am having the following issue when running the example command:
"selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div.section-layout.section-scrollbox.scrollable-y.scrollable-show"}
(Session info: headless chrome=90.0.4430.93)"
I tried this URL = "https://www.google.com/maps/place/Nike+Alto+Palermo/@-34.5883936,-58.4098625,15z/data=!4m7!3m6!1s0x0:0x842bcbef147891ee!8m2!3d-34.588376!4d-58.4098447!9m1!1b1"
But the same happens when I try to use the URLs at urls.txt
I am using a Mac and installed chromedrive using brew
I found this very useful . Can you please tell how to grab avatar link also
I encountered the following error while scraping reviews of this business
Any help is apprecaited.
[0227/214023.992:INFO:CONSOLE(1560)] "Uncaught RangeError: Maximum call stack size exceeded", source: /maps/_/js/k=maps.m.en.FigERXCYMc0.2019.O/ck=maps.m.fQVt13g1oTE.L.W.O/m=vwr,vd,a,duc,owc,ob2,sp,en,smi,sc,vlg,smr,as,bpw,wrc/am=BsgEBA/rt=j/d=1/rs=ACT90oHs0cWOVxL_9t5x_yY1Y1NZDyb6qg/ed=1/exm=sc2,per,mo,lp,ti,ds,stx,pwd,dw,ppl,log,std,b (1560)
Hello, is there a plan for supporting Initial place map url rather than clicking on reviews button.
the idea is i want to automate the whole extraction operation without human intervention (clicking on reviews).
I am getting this error while running the script how can i solve it ?
When I set --N to 10000, it scraped 1140 reviews, then threw this message and stopped:
[0724/211919.550:INFO:CONSOLE(1550)] "Uncaught RangeError: Maximum call stack size exceeded", source: /maps/_/js/k=maps.m.en.b7ZwJWQZkHM.2019.O/ck=maps.m.HgR1ySVFXik.L.W.O/m=vwr,vd,a,nrw,owc,ob2,sp,en,smi,sc,vlg,smr,as,wrc/am=BoDCIhAB/rt=j/d=1/rs=ACT90oEnDViSjerMr5DSozguPqRfPvO2Xg/ed=1/exm=sc2,per,mo,lp,ti,ds,stx,dwi,enr,pwd,dw,ppl,log,std,b (1550)
Any advice would be appreciated.
I have everyting except relative_date and rating. If you can please advise how to resolve it?
/googlemaps-scraper/googlemaps.py", line 314, in __get_driver
input_driver = webdriver.Chrome(executable_path=ChromeDriverManager(log_level=0).install(), options=options)
TypeError: init() got an unexpected keyword argument 'log_level'
Hello, if i set --sort_by most_relevant then some error:
selenium.common.exceptions.ElementNotVisibleException: Message: element not interactable
Hi Mattia,
Thansk for sharing your script, it works flawlessly!
I am now trying to re-adapt it to scrape 'most relevant reviews' rather than 'newest' ones.
However, if I change line 66 in googlemaps.py to pick the 'first element' rather than the 'second element', the __scroll function will not go through.
I was wondering whether you faced this difficulty before.
Thanks in advance.
Cheers,
Michele
The position of self.__scroll()
in these lines is incorrect, causing self.__expand_reviews()
to be executed before the 'expand more' buttons are loaded. I believe that self.__scroll() was intended to be placed immediately after the comment # scroll to load reviews
.
googlemaps-scraper/googlemaps.py
Lines 163 to 172 in b1bc007
HI, i was trying to execute the code on the default location in urls.txt and I've got the following in the log file with no input on terminal.
2020-02-23 18:34:28,685 - WARNING - Failed to click recent button
2020-02-23 18:34:38,990 - WARNING - Failed to click recent button
2020-02-23 18:34:49,301 - WARNING - Failed to click recent button
2020-02-23 18:34:59,635 - WARNING - Failed to click recent button
2020-02-23 18:35:09,984 - WARNING - Failed to click recent button
2020-02-23 18:35:20,309 - WARNING - Failed to click recent button
2020-02-23 18:35:30,624 - WARNING - Failed to click recent button
2020-02-23 18:35:40,961 - WARNING - Failed to click recent button
2020-02-23 18:35:51,317 - WARNING - Failed to click recent button
2020-02-23 18:36:01,665 - WARNING - Failed to click recent button
Looking at googlemaps.py i found that the issue is within:
try:
menu_bt = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'div.cYrDcjyGO77__container'))) # //button[@data-value=\'Sort\'] XPath with graphical interface
menu_bt.click()
clicked = True
time.sleep(3)
except Exception as e:
tries += 1
self.logger.warn('Failed to click recent button')
Could you please explain what is happening and why it isn't working.
hi any way you can add emails and ratings too pls?
Returns the following error:
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div.siAUzd-neVct.section-scrollbox.cYB2Ge-oHo7ed.cYB2Ge-ti6hGc"}```
Hi,
I get reviews on the CSV file but can't get the original text (not in English).
How can I fix it?
Eg:
(Translated by Google) When you come to Vietnam, visiting a beauty salon is a mandatory course (Original) ë² íŠ¸ë‚¨ì—� 오면 미장ì›� 방문ì�´ 필수 코스 ì •ë‹µì�´ë„¤ìš”
When running the scraper.py file, print(scraper.get_account(url)) in line 41 and 42 is never executed. I commented out the if statement and found an error under the __parse_place function in line 164 and 165.
Error:
File "googlemaps.py", line 165, in __parse_place
place['overall_rating'] = float(response.find('div', class_='gm2-display-2').text.replace(',', '.')) AttributeError: 'NoneType' object has no attribute 'text'
Tried using the required beautifulsoup, selenium version and tried different versions of chromedriver. That did not solve the issue. What could be the problem?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.