gaspa93 / googlemaps-scraper Goto Github PK

View Code? Open in Web Editor NEW

345.0 18.0 125.0 147 KB

Google Maps reviews scraping

License: GNU General Public License v3.0

Python 100.00%

scraper python reviews google-maps google-maps-scraping

googlemaps-scraper's People

Contributors

Stargazers

Watchers

Forkers

venomouscyanide adilshehzad786 quickcoffee ctscanner tobeal taha230 arsalanzia1 bcjanecek juliocrn94 rockiniowa andr333y ryanramadhanii ursulean lainnn jamalarain surendrasai farhanjusoh xfumihiro jimmy-feng risooonho mohrosidi devendermahto zied-d jlchulilla mesutbeyler georgi-petkov maollm syiham11 quaesito connor-mccarthy vasim07 ajaypsrivatsa mohammadforouhesh gtesk samlex20 raunaqss juanoliveros sdllabs hetekce alauddinkuet ke511081177 mohitmodi4365 yaragd alaincr xedro98 nimesh-kumar-gravitas dawnyp andalusia-data-science-team navan0 smile-labs jel1130 statesman4 samirarman priest671 yomama84 rchinta15 dev-dpoudel godspeedhuang bino-m mrfiona grim-reapper duzda matheus-ribeiro-ita bitcodingsolutions usherwang1994 mrabhixd tomfbush taozang62 astrawnuts apcloic lespa-r fahdamjad vgs549 more-ginger pusheenthekarou mcx laurinivaldi ryuuzake hoculus2 chandchv nebouy arjm repogis die-zeitgeist thetaxmatterz alabmh0d daniel-smetana saito828koki gigisung brianjonesslc maxime1907 murithijoshua ohkubosgms sferguson32155 pyaustine erkhemee0908 seshakiran cemreefe devhttps 14197

googlemaps-scraper's Issues

possible confilcting requirements

from requirements.txt:

beautifulsoup4==4.6.0
certifi==2022.12.7
charset-normalizer==2.0.12
colorama==0.4.5
configparser==5.2.0
crayons==0.4.0
idna==3.3
numpy==1.23.0
pandas==1.4.3
pymongo==3.9.0
python-dateutil==2.8.2
pytz==2022.1
requests==2.31.0
selenium==3.14.0
six==1.16.0
termcolor==1.1.0
webdriver-manager==3.5.2
pandas==0.25.2
numpy==1.22.0

there are 2 pandas and numpy versions listed

Failed to click recent button while debug works just fine

Hi, I ran this script on my previous computer and it works fine, but when I switched to my new computer, it keeps gives me "failed to click recent button", I checked debug and debug works just as it should, could you give me some idea how to solve this?

Uncaught RangeError: Maximum call stack size exceeded"

Hello,

I am getting an Uncaught RangeError for locations where the reviews are more than 900-1000. This is probably from the while loop in scrapper.py. Is there a way to resolve this. Thanks in advance.

Best

Stale Element Reference Error

Hi @gaspa93,
I was trying to use the following URL: https://www.google.com/maps/place/Ellora+Caves/@20.025817,75.1779975,17z/data=!4m7!3m6!1s0x3bdb93bd138ae4bd:0x574c6482cf0b89cf!8m2!3d20.025817!4d75.1779975!9m1!1b1
for scraping N=1000 reviews and sort by = most relevant when I got this error: selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
Full error:

[Review 0]
Traceback (most recent call last):
File "scraper.py", line 63, in
reviews = scraper.get_reviews(n)
File "/home/maunil/Desktop/googlemaps-scraper/googlemaps.py", line 172, in get_reviews
self.__expand_reviews()
File "/home/maunil/Desktop/googlemaps-scraper/googlemaps.py", line 298, in __expand_reviews
l.click()
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 80, in click
self._execute(Command.CLICK_ELEMENT)
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 628, in _execute
return self._parent.execute(command, params)
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
self.error_handler.check_response(response)
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: headless chrome=103.0.5060.114)

Autofocus processing was blocked because a document already has a focused element.

I'm unable to scarp the reviews, it showed me this
DevTools listening on ws://127.0.0.1:57417/devtools/browser/4c960584-6cdf-450f-831f-a1175e7d6d6a
[0723/113647.067:INFO:CONSOLE(0)] "Autofocus processing was blocked because a document already has a focused element.", source: https://www.google.com/maps/place/Al+Salaam+Mall/@21.5078941,39.2233532,15z/data=!4m8!3m7!1s0x15c3ce6cdb182a97:0x29f6012ad865f128!8m2!3d21.5078941!4d39.2233532!9m1!1b1!16s%2Fg%2F11bvt4d9_v?entry=ttu (0)
←[36m[Review 0]←[0m

any advice how to solve it, it was work last week

Limit to 900 reviews

Hi,
If i want to get all reviews from a big places (lets say a mcdo), i only get 900 reviews then google ban us : urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fef9782ea20>: Failed to establish a new connection: [Errno 61] Connection refused but i have to Ctrl+C your script to get this error
(i think you are trying again, and again, and again... but google still ban you) ahah

Googlemaps business info

Hi dear Mattia @gaspa93. Would you consider creating a library that pulls data on any business from Googlemaps (business name, avg rating, open hours, price range ($) etc.?

Expanding Reviews

__expand_reviews(self): doesn't seem to expand all the reviews.

Recent reviews won't work

Google changed the role names, so the most recent function no longer works.
I tried to change the role names, but it seems like that's not the only thing they have changed

Missing pandas and numpy in requirements.txt

pandas and numpy are imported but are not in requirements.txt

selenium.common.exceptions.NoSuchElementException: Message: no such element:

[Review 0]
Traceback (most recent call last):
File "/Users/satyammishra/Desktop/sentiment_analysis/googlemaps-scraper/scraper.py", line 63, in
reviews = scraper.get_reviews(n)
File "/Users/satyammishra/Desktop/sentiment_analysis/googlemaps-scraper/googlemaps.py", line 168, in get_reviews
self.__scroll()
File "/Users/satyammishra/Desktop/sentiment_analysis/googlemaps-scraper/googlemaps.py", line 278, in __scroll
scrollable_div = self.driver.find_element(By.CSS_SELECTOR, 'div.siAUzd-neVct.section-scrollbox.cYB2Ge-oHo7ed.cYB2Ge-ti6hGc')
File "/Users/satyammishra/opt/anaconda3/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 857, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "/Users/satyammishra/opt/anaconda3/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
self.error_handler.check_response(response)
File "/Users/satyammishra/opt/anaconda3/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div.siAUzd-neVct.section-scrollbox.cYB2Ge-oHo7ed.cYB2Ge-ti6hGc"}
(Session info: headless chrome=103.0.5060.134)
Stacktrace:
0 chromedriver 0x0000000102472ef9 chromedriver + 4480761
1 chromedriver 0x00000001023fe5d3 chromedriver + 4003283
2 chromedriver 0x0000000102091338 chromedriver + 410424
3 chromedriver 0x00000001020c75bd chromedriver + 632253
4 chromedriver 0x00000001020c7841 chromedriver + 632897
5 chromedriver 0x00000001020f97d4 chromedriver + 837588
6 chromedriver 0x00000001020e4a8d chromedriver + 752269
7 chromedriver 0x00000001020f74f1 chromedriver + 828657
8 chromedriver 0x00000001020e4953 chromedriver + 751955
9 chromedriver 0x00000001020bacd5 chromedriver + 580821
10 chromedriver 0x00000001020bbd25 chromedriver + 584997
11 chromedriver 0x000000010244402d chromedriver + 4288557
12 chromedriver 0x00000001024491b3 chromedriver + 4309427
13 chromedriver 0x000000010244e23f chromedriver + 4330047
14 chromedriver 0x0000000102449dfa chromedriver + 4312570
15 chromedriver 0x0000000102422fef chromedriver + 4153327
16 chromedriver 0x0000000102463d78 chromedriver + 4418936
17 chromedriver 0x0000000102463eff chromedriver + 4419327
18 chromedriver 0x000000010247aab5 chromedriver + 4512437
19 libsystem_pthread.dylib 0x00007ff811a524f4 _pthread_start + 1

Parallelism

Can we make __scroll and __expand_reviews work parallely in the get_reviews function to improve performance?

Any chance of an update?

Hello guys and especially Gaspa!

This is just an amazing tool. Im an amateur in programming but I can see how advanced this tool is and manages to do quite a bit. If only there was an update to make it work. I've struggled with it for a few days but unfortunately Im just not at a level where I can fix it.
It would be amazing if someone could update it.
Thank you again!

webdriver_manager pointing to browser version instead of driver version?

I followed the readme including installing dependencies to new environment. when I try to run on terminal, i get the following error. I tried tracing it back, not familiar with chormedriver manager.

in the instructions I downloaded chromedriver and placed it in the root dir of the scraper just in case.

(google_maps_scrape) jg@J-MacBook-Pro googlemaps-scraper % python scraper.py --N 50 --i urls_1.txt
Traceback (most recent call last):
File "/Users/folder/googlemaps-scraper/scraper.py", line 43, in
with GoogleMapsScraper(debug=args.debug) as scraper:
File "/Users/folder/googlemaps-scraper/googlemaps.py", line 31, in init
self.driver = self.__get_driver()
File "/Users/folder/googlemaps-scraper/googlemaps.py", line 377, in __get_driver
input_driver = webdriver.Chrome(executable_path=ChromeDriverManager(log_level=0).install(), options=options)
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/chrome.py", line 32, in install
driver_path = self._get_driver_path(self.driver)
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/manager.py", line 23, in _get_driver_path
driver_version = driver.get_version()
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/driver.py", line 41, in get_version
return self.get_latest_release_version()
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/driver.py", line 74, in get_latest_release_version
validate_response(resp)
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/utils.py", line 80, in validate_response
raise ValueError("There is no such driver by url {}".format(resp.url))
ValueError: There is no such driver by url https://chromedriver.storage.googleapis.com/LATEST_RELEASE_118.0.5993
(google_maps_scrape) jg@J-MacBook-Pro googlemaps-scraper %

pip list:

Package Version

appnope 0.1.3
asttokens 2.4.1
backcall 0.2.0
beautifulsoup4 4.6.0
certifi 2022.12.7
charset-normalizer 2.0.12
colorama 0.4.5
comm 0.1.4
configparser 5.2.0
crayons 0.4.0
debugpy 1.8.0
decorator 5.1.1
exceptiongroup 1.1.3
executing 2.0.0
idna 3.3
ipykernel 6.26.0
ipython 8.16.1
jedi 0.19.1
jupyter_client 8.5.0
jupyter_core 5.4.0
matplotlib-inline 0.1.6
nest-asyncio 1.5.8
numpy 1.23.0
packaging 23.2
pandas 1.4.3
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pip 23.3
platformdirs 3.11.0
prompt-toolkit 3.0.39
psutil 5.9.6
ptyprocess 0.7.0
pure-eval 0.2.2
Pygments 2.16.1
pymongo 3.9.0
python-dateutil 2.8.2
python-dotenv 1.0.0
pytz 2022.1
pyzmq 25.1.1
requests 2.31.0
selenium 3.14.0
setuptools 68.0.0
six 1.16.0
stack-data 0.6.3
termcolor 1.1.0
tornado 6.3.3
traitlets 5.12.0
urllib3 2.0.7
wcwidth 0.2.8
webdriver-manager 3.5.2
wheel 0.41.2

Fails to click sorting button

Hi @gaspa93, I was attempting to scrape the following URL using the command python scraper.py --N 50 --i urls.txt --debug:
https://www.google.com/maps/place/The+Kutaya/@-8.7131747,115.18668,13z/data=!4m11!3m10!1s0x2dd2441f302d3927:0x7fdd6aa714bc38e1!5m2!4m1!1i2!8m2!3d-8.7389695!4d115.1673844!9m1!1b1!16s%2Fg%2F11cn5x9s4l?entry=ttu

However, the log displays a warning message stating, "Failed to click sorting button" after the script finishes running. Is there any way to fix this issue? Thank you!

Hi Mattia

I am trying to use your script but when i execute it on my cmd nothing appears in the data folder.
I just added 1 import ´from http import cookies´ because SameSite error appears, and this apparently solves it.
WARNING:
"A cookie associated with a cross-site resource at http://google.com/ was set without the SameSite attribute. A future release of Chrome will only deliver cookies with cross-site requests if they are set with SameSite=None and Secure. You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.", source: https://www.google.it/maps/place/Pantheon/@41.8986108,12.4746842,17z/data=!3m1!4b1!4m7!3m6!1s0x132f604f678640a9:0xcad165fa2036ce2c!8m2!3d41.8986108!4d12.4768729!9m1!1b1 (0)

ERROR file gm-scraper.txt:
2020-03-20 14:22:17,098 - WARNING - Failed to click recent button
2020-03-20 14:22:27,451 - WARNING - Failed to click recent button
2020-03-20 14:22:37,877 - WARNING - Failed to click recent button

If you can help me i will apreciate it because i need to get the data for my end-of-degree project.

Thank you soo much.

Fails to click the sorting button

Test to reproduce the problem:

urls.txt

https://www.google.com/maps/place/Roadhouse+Restaurant+Pisa/@43.696512,10.3887242,17z/data=!4m7!3m6!1s0x12d591b4d21ce279:0x55bfd66c266e3a6e!8m2!3d43.6983437!4d10.3883958!9m1!1b1

python scraper.py --N 50 --i urls.txt --debug
cat gm-scraper.log

2022-04-14 16:22:02,936 - WARNING - Failed to click sorting button

Posible problem with the scrollable js script

I am having the following issue when running the example command:

"selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div.section-layout.section-scrollbox.scrollable-y.scrollable-show"}
(Session info: headless chrome=90.0.4430.93)"

I tried this URL = "https://www.google.com/maps/place/Nike+Alto+Palermo/@-34.5883936,-58.4098625,15z/data=!4m7!3m6!1s0x0:0x842bcbef147891ee!8m2!3d-34.588376!4d-58.4098447!9m1!1b1"

But the same happens when I try to use the URLs at urls.txt

I am using a Mac and installed chromedrive using brew

On running this program i am getting a blank csv file

I found this very useful . Can you please tell how to grab avatar link also

"Uncaught RangeError: Maximum call stack size exceeded" error

I encountered the following error while scraping reviews of this business

Any help is apprecaited.

[0227/214023.992:INFO:CONSOLE(1560)] "Uncaught RangeError: Maximum call stack size exceeded", source: /maps/_/js/k=maps.m.en.FigERXCYMc0.2019.O/ck=maps.m.fQVt13g1oTE.L.W.O/m=vwr,vd,a,duc,owc,ob2,sp,en,smi,sc,vlg,smr,as,bpw,wrc/am=BsgEBA/rt=j/d=1/rs=ACT90oHs0cWOVxL_9t5x_yY1Y1NZDyb6qg/ed=1/exm=sc2,per,mo,lp,ti,ds,stx,pwd,dw,ppl,log,std,b (1560)

Support Initial place map url

Hello, is there a plan for supporting Initial place map url rather than clicking on reviews button.

the idea is i want to automate the whole extraction operation without human intervention (clicking on reviews).

no use for MAX_SCROLL

In your current code, there is no use for MAX_SCROLL, and I'm not sure why but all i get is this

please guide me on what needs to be done.
this is the error message i get once i CTRL+C

ValueError: There is no such driver by url https://chromedriver.storage.googleapis.com/LATEST_RELEASE_115.0.5790

I am getting this error while running the script how can i solve it ?

Maximum call stack size exceeded

When I set --N to 10000, it scraped 1140 reviews, then threw this message and stopped:
[0724/211919.550:INFO:CONSOLE(1550)] "Uncaught RangeError: Maximum call stack size exceeded", source: /maps/_/js/k=maps.m.en.b7ZwJWQZkHM.2019.O/ck=maps.m.HgR1ySVFXik.L.W.O/m=vwr,vd,a,nrw,owc,ob2,sp,en,smi,sc,vlg,smr,as,wrc/am=BoDCIhAB/rt=j/d=1/rs=ACT90oEnDViSjerMr5DSozguPqRfPvO2Xg/ed=1/exm=sc2,per,mo,lp,ti,ds,stx,dwi,enr,pwd,dw,ppl,log,std,b (1550)

This is the url I used:
https://www.google.it/maps/place/Pantheon/@41.8986108,12.4768729,17z/data=!4m18!1m9!3m8!1s0x132f604f678640a9:0xcad165fa2036ce2c!2sPantheon!8m2!3d41.8986108!4d12.4768729!9m1!1b1!16zL20vMDF4emR6!3m7!1s0x132f604f678640a9:0xcad165fa2036ce2c!8m2!3d41.8986108!4d12.4768729!9m1!1b1!16zL20vMDF4emR6?entry=ttu

Any advice would be appreciated.

relative date and rating not parsed

I have everyting except relative_date and rating. If you can please advise how to resolve it?

https://www.google.kz/maps/place/Best+Western+Mornington+Hotel+London+Hyde+Park/@51.4900379,-0.2184022,12.83z/data=!4m11!3m10!1s0x487605525979b281:0x6987f056cae619c!5m2!4m1!1i2!8m2!3d51.5119933!4d-0.1782794!9m1!1b1!16s%2Fg%2F1tj7vw08?entry=ttu

unexpected keyword argument 'log_level'

/googlemaps-scraper/googlemaps.py", line 314, in __get_driver
input_driver = webdriver.Chrome(executable_path=ChromeDriverManager(log_level=0).install(), options=options)
TypeError: init() got an unexpected keyword argument 'log_level'

most_relevant

Hello, if i set --sort_by most_relevant then some error:
selenium.common.exceptions.ElementNotVisibleException: Message: element not interactable

scrape most relevant reviews

Hi Mattia,

Thansk for sharing your script, it works flawlessly!

I am now trying to re-adapt it to scrape 'most relevant reviews' rather than 'newest' ones.
However, if I change line 66 in googlemaps.py to pick the 'first element' rather than the 'second element', the __scroll function will not go through.

I was wondering whether you faced this difficulty before.

Thanks in advance.

Cheers,
Michele

`__expand_reviews` sometimes not working

The position of self.__scroll() in these lines is incorrect, causing self.__expand_reviews() to be executed before the 'expand more' buttons are loaded. I believe that self.__scroll() was intended to be placed immediately after the comment # scroll to load reviews.

googlemaps-scraper/googlemaps.py

Lines 163 to 172 in b1bc007

    
           # scroll to load reviews 
        
           # wait for other reviews to load (ajax) 
        
           time.sleep(4) 
        
           self.__scroll() 
        
           # expand review text 
        
           self.__expand_reviews()

Failed to click recent button

HI, i was trying to execute the code on the default location in urls.txt and I've got the following in the log file with no input on terminal.

2020-02-23 18:34:28,685 - WARNING - Failed to click recent button
2020-02-23 18:34:38,990 - WARNING - Failed to click recent button
2020-02-23 18:34:49,301 - WARNING - Failed to click recent button
2020-02-23 18:34:59,635 - WARNING - Failed to click recent button
2020-02-23 18:35:09,984 - WARNING - Failed to click recent button
2020-02-23 18:35:20,309 - WARNING - Failed to click recent button
2020-02-23 18:35:30,624 - WARNING - Failed to click recent button
2020-02-23 18:35:40,961 - WARNING - Failed to click recent button
2020-02-23 18:35:51,317 - WARNING - Failed to click recent button
2020-02-23 18:36:01,665 - WARNING - Failed to click recent button

Looking at googlemaps.py i found that the issue is within:

   try:
            menu_bt = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'div.cYrDcjyGO77__container')))  # //button[@data-value=\'Sort\'] XPath with graphical interface
            menu_bt.click()
            clicked = True
            time.sleep(3)
     except Exception as e:
            tries += 1
            self.logger.warn('Failed to click recent button')

Could you please explain what is happening and why it isn't working.

emails

hi any way you can add emails and ratings too pls?

Fails on [Review 0]

Returns the following error:

raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div.siAUzd-neVct.section-scrollbox.cYB2Ge-oHo7ed.cYB2Ge-ti6hGc"}```

can't get original language of reviews.

Hi,
I get reviews on the CSV file but can't get the original text (not in English).
How can I fix it?
Eg:
(Translated by Google) When you come to Vietnam, visiting a beauty salon is a mandatory course (Original) ë² íŠ¸ë‚¨ì—� ì˜¤ë©´ ë¯¸ìž¥ì›� ë°©ë¬¸ì�´ í•„ìˆ˜ ì½”ìŠ¤ ì •ë‹µì�´ë„¤ìš”

__parse_place function error

When running the scraper.py file, print(scraper.get_account(url)) in line 41 and 42 is never executed. I commented out the if statement and found an error under the __parse_place function in line 164 and 165.

Error:

File "googlemaps.py", line 165, in __parse_place

place['overall_rating'] = float(response.find('div', class_='gm2-display-2').text.replace(',', '.')) AttributeError: 'NoneType' object has no attribute 'text'

Tried using the required beautifulsoup, selenium version and tried different versions of chromedriver. That did not solve the issue. What could be the problem?

	# scroll to load reviews

	# wait for other reviews to load (ajax)
	time.sleep(4)

	self.__scroll()


	# expand review text
	self.__expand_reviews()