Coder Social home page Coder Social logo

googlemaps-scraper's People

Contributors

dependabot[bot] avatar gaspa93 avatar gtesk avatar ryuuzake avatar saito828koki avatar samirarman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

googlemaps-scraper's Issues

possible confilcting requirements

from requirements.txt:

beautifulsoup4==4.6.0
certifi==2022.12.7
charset-normalizer==2.0.12
colorama==0.4.5
configparser==5.2.0
crayons==0.4.0
idna==3.3
numpy==1.23.0
pandas==1.4.3
pymongo==3.9.0
python-dateutil==2.8.2
pytz==2022.1
requests==2.31.0
selenium==3.14.0
six==1.16.0
termcolor==1.1.0
webdriver-manager==3.5.2
pandas==0.25.2
numpy==1.22.0

there are 2 pandas and numpy versions listed

Failed to click recent button while debug works just fine

Hi, I ran this script on my previous computer and it works fine, but when I switched to my new computer, it keeps gives me "failed to click recent button", I checked debug and debug works just as it should, could you give me some idea how to solve this?

Stale Element Reference Error

Hi @gaspa93,
I was trying to use the following URL: https://www.google.com/maps/place/Ellora+Caves/@20.025817,75.1779975,17z/data=!4m7!3m6!1s0x3bdb93bd138ae4bd:0x574c6482cf0b89cf!8m2!3d20.025817!4d75.1779975!9m1!1b1
for scraping N=1000 reviews and sort by = most relevant when I got this error: selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
Full error:

[Review 0]
Traceback (most recent call last):
File "scraper.py", line 63, in
reviews = scraper.get_reviews(n)
File "/home/maunil/Desktop/googlemaps-scraper/googlemaps.py", line 172, in get_reviews
self.__expand_reviews()
File "/home/maunil/Desktop/googlemaps-scraper/googlemaps.py", line 298, in __expand_reviews
l.click()
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 80, in click
self._execute(Command.CLICK_ELEMENT)
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webelement.py", line 628, in _execute
return self._parent.execute(command, params)
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 320, in execute
self.error_handler.check_response(response)
File "/home/maunil/Desktop/venv/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: headless chrome=103.0.5060.114)

Autofocus processing was blocked because a document already has a focused element.

I'm unable to scarp the reviews, it showed me this
DevTools listening on ws://127.0.0.1:57417/devtools/browser/4c960584-6cdf-450f-831f-a1175e7d6d6a
[0723/113647.067:INFO:CONSOLE(0)] "Autofocus processing was blocked because a document already has a focused element.", source: https://www.google.com/maps/place/Al+Salaam+Mall/@21.5078941,39.2233532,15z/data=!4m8!3m7!1s0x15c3ce6cdb182a97:0x29f6012ad865f128!8m2!3d21.5078941!4d39.2233532!9m1!1b1!16s%2Fg%2F11bvt4d9_v?entry=ttu (0)
←[36m[Review 0]←[0m

any advice how to solve it, it was work last week

Limit to 900 reviews

Hi,
If i want to get all reviews from a big places (lets say a mcdo), i only get 900 reviews then google ban us : urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fef9782ea20>: Failed to establish a new connection: [Errno 61] Connection refused but i have to Ctrl+C your script to get this error
(i think you are trying again, and again, and again... but google still ban you) ahah

Googlemaps business info

Hi dear Mattia @gaspa93. Would you consider creating a library that pulls data on any business from Googlemaps (business name, avg rating, open hours, price range ($) etc.?

Recent comment

Recent reviews won't work

Google changed the role names, so the most recent function no longer works.
I tried to change the role names, but it seems like that's not the only thing they have changed
image

selenium.common.exceptions.NoSuchElementException: Message: no such element:

[Review 0]
Traceback (most recent call last):
File "/Users/satyammishra/Desktop/sentiment_analysis/googlemaps-scraper/scraper.py", line 63, in
reviews = scraper.get_reviews(n)
File "/Users/satyammishra/Desktop/sentiment_analysis/googlemaps-scraper/googlemaps.py", line 168, in get_reviews
self.__scroll()
File "/Users/satyammishra/Desktop/sentiment_analysis/googlemaps-scraper/googlemaps.py", line 278, in __scroll
scrollable_div = self.driver.find_element(By.CSS_SELECTOR, 'div.siAUzd-neVct.section-scrollbox.cYB2Ge-oHo7ed.cYB2Ge-ti6hGc')
File "/Users/satyammishra/opt/anaconda3/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 857, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "/Users/satyammishra/opt/anaconda3/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
self.error_handler.check_response(response)
File "/Users/satyammishra/opt/anaconda3/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div.siAUzd-neVct.section-scrollbox.cYB2Ge-oHo7ed.cYB2Ge-ti6hGc"}
(Session info: headless chrome=103.0.5060.134)
Stacktrace:
0 chromedriver 0x0000000102472ef9 chromedriver + 4480761
1 chromedriver 0x00000001023fe5d3 chromedriver + 4003283
2 chromedriver 0x0000000102091338 chromedriver + 410424
3 chromedriver 0x00000001020c75bd chromedriver + 632253
4 chromedriver 0x00000001020c7841 chromedriver + 632897
5 chromedriver 0x00000001020f97d4 chromedriver + 837588
6 chromedriver 0x00000001020e4a8d chromedriver + 752269
7 chromedriver 0x00000001020f74f1 chromedriver + 828657
8 chromedriver 0x00000001020e4953 chromedriver + 751955
9 chromedriver 0x00000001020bacd5 chromedriver + 580821
10 chromedriver 0x00000001020bbd25 chromedriver + 584997
11 chromedriver 0x000000010244402d chromedriver + 4288557
12 chromedriver 0x00000001024491b3 chromedriver + 4309427
13 chromedriver 0x000000010244e23f chromedriver + 4330047
14 chromedriver 0x0000000102449dfa chromedriver + 4312570
15 chromedriver 0x0000000102422fef chromedriver + 4153327
16 chromedriver 0x0000000102463d78 chromedriver + 4418936
17 chromedriver 0x0000000102463eff chromedriver + 4419327
18 chromedriver 0x000000010247aab5 chromedriver + 4512437
19 libsystem_pthread.dylib 0x00007ff811a524f4 _pthread_start + 1

Parallelism

Can we make __scroll and __expand_reviews work parallely in the get_reviews function to improve performance?

Any chance of an update?

Hello guys and especially Gaspa!

This is just an amazing tool. Im an amateur in programming but I can see how advanced this tool is and manages to do quite a bit. If only there was an update to make it work. I've struggled with it for a few days but unfortunately Im just not at a level where I can fix it.
It would be amazing if someone could update it.
Thank you again!

webdriver_manager pointing to browser version instead of driver version?

I followed the readme including installing dependencies to new environment. when I try to run on terminal, i get the following error. I tried tracing it back, not familiar with chormedriver manager.

in the instructions I downloaded chromedriver and placed it in the root dir of the scraper just in case.

(google_maps_scrape) jg@J-MacBook-Pro googlemaps-scraper % python scraper.py --N 50 --i urls_1.txt
Traceback (most recent call last):
File "/Users/folder/googlemaps-scraper/scraper.py", line 43, in
with GoogleMapsScraper(debug=args.debug) as scraper:
File "/Users/folder/googlemaps-scraper/googlemaps.py", line 31, in init
self.driver = self.__get_driver()
File "/Users/folder/googlemaps-scraper/googlemaps.py", line 377, in __get_driver
input_driver = webdriver.Chrome(executable_path=ChromeDriverManager(log_level=0).install(), options=options)
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/chrome.py", line 32, in install
driver_path = self._get_driver_path(self.driver)
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/manager.py", line 23, in _get_driver_path
driver_version = driver.get_version()
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/driver.py", line 41, in get_version
return self.get_latest_release_version()
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/driver.py", line 74, in get_latest_release_version
validate_response(resp)
File "/usr/local/anaconda3/envs/google_maps_scrape/lib/python3.10/site-packages/webdriver_manager/utils.py", line 80, in validate_response
raise ValueError("There is no such driver by url {}".format(resp.url))
ValueError: There is no such driver by url https://chromedriver.storage.googleapis.com/LATEST_RELEASE_118.0.5993
(google_maps_scrape) jg@J-MacBook-Pro googlemaps-scraper %

pip list:

Package Version


appnope 0.1.3
asttokens 2.4.1
backcall 0.2.0
beautifulsoup4 4.6.0
certifi 2022.12.7
charset-normalizer 2.0.12
colorama 0.4.5
comm 0.1.4
configparser 5.2.0
crayons 0.4.0
debugpy 1.8.0
decorator 5.1.1
exceptiongroup 1.1.3
executing 2.0.0
idna 3.3
ipykernel 6.26.0
ipython 8.16.1
jedi 0.19.1
jupyter_client 8.5.0
jupyter_core 5.4.0
matplotlib-inline 0.1.6
nest-asyncio 1.5.8
numpy 1.23.0
packaging 23.2
pandas 1.4.3
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pip 23.3
platformdirs 3.11.0
prompt-toolkit 3.0.39
psutil 5.9.6
ptyprocess 0.7.0
pure-eval 0.2.2
Pygments 2.16.1
pymongo 3.9.0
python-dateutil 2.8.2
python-dotenv 1.0.0
pytz 2022.1
pyzmq 25.1.1
requests 2.31.0
selenium 3.14.0
setuptools 68.0.0
six 1.16.0
stack-data 0.6.3
termcolor 1.1.0
tornado 6.3.3
traitlets 5.12.0
urllib3 2.0.7
wcwidth 0.2.8
webdriver-manager 3.5.2
wheel 0.41.2

Fails to click sorting button

Hi @gaspa93, I was attempting to scrape the following URL using the command python scraper.py --N 50 --i urls.txt --debug:
https://www.google.com/maps/place/The+Kutaya/@-8.7131747,115.18668,13z/data=!4m11!3m10!1s0x2dd2441f302d3927:0x7fdd6aa714bc38e1!5m2!4m1!1i2!8m2!3d-8.7389695!4d115.1673844!9m1!1b1!16s%2Fg%2F11cn5x9s4l?entry=ttu

However, the log displays a warning message stating, "Failed to click sorting button" after the script finishes running. Is there any way to fix this issue? Thank you!

Hi Mattia

I am trying to use your script but when i execute it on my cmd nothing appears in the data folder.
I just added 1 import ´from http import cookies´ because SameSite error appears, and this apparently solves it.
WARNING:
"A cookie associated with a cross-site resource at http://google.com/ was set without the SameSite attribute. A future release of Chrome will only deliver cookies with cross-site requests if they are set with SameSite=None and Secure. You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.", source: https://www.google.it/maps/place/Pantheon/@41.8986108,12.4746842,17z/data=!3m1!4b1!4m7!3m6!1s0x132f604f678640a9:0xcad165fa2036ce2c!8m2!3d41.8986108!4d12.4768729!9m1!1b1 (0)

ERROR file gm-scraper.txt:
2020-03-20 14:22:17,098 - WARNING - Failed to click recent button
2020-03-20 14:22:27,451 - WARNING - Failed to click recent button
2020-03-20 14:22:37,877 - WARNING - Failed to click recent button

If you can help me i will apreciate it because i need to get the data for my end-of-degree project.

Thank you soo much.

Posible problem with the scrollable js script

I am having the following issue when running the example command:

"selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div.section-layout.section-scrollbox.scrollable-y.scrollable-show"}
(Session info: headless chrome=90.0.4430.93)"

I tried this URL = "https://www.google.com/maps/place/Nike+Alto+Palermo/@-34.5883936,-58.4098625,15z/data=!4m7!3m6!1s0x0:0x842bcbef147891ee!8m2!3d-34.588376!4d-58.4098447!9m1!1b1"

But the same happens when I try to use the URLs at urls.txt

I am using a Mac and installed chromedrive using brew

"Uncaught RangeError: Maximum call stack size exceeded" error

I encountered the following error while scraping reviews of this business

Any help is apprecaited.

[0227/214023.992:INFO:CONSOLE(1560)] "Uncaught RangeError: Maximum call stack size exceeded", source: /maps/_/js/k=maps.m.en.FigERXCYMc0.2019.O/ck=maps.m.fQVt13g1oTE.L.W.O/m=vwr,vd,a,duc,owc,ob2,sp,en,smi,sc,vlg,smr,as,bpw,wrc/am=BsgEBA/rt=j/d=1/rs=ACT90oHs0cWOVxL_9t5x_yY1Y1NZDyb6qg/ed=1/exm=sc2,per,mo,lp,ti,ds,stx,pwd,dw,ppl,log,std,b (1560)

Support Initial place map url

Hello, is there a plan for supporting Initial place map url rather than clicking on reviews button.

the idea is i want to automate the whole extraction operation without human intervention (clicking on reviews).

no use for MAX_SCROLL

In your current code, there is no use for MAX_SCROLL, and I'm not sure why but all i get is this
review
please guide me on what needs to be done.
this is the error message i get once i CTRL+C
error

Maximum call stack size exceeded

When I set --N to 10000, it scraped 1140 reviews, then threw this message and stopped:
[0724/211919.550:INFO:CONSOLE(1550)] "Uncaught RangeError: Maximum call stack size exceeded", source: /maps/_/js/k=maps.m.en.b7ZwJWQZkHM.2019.O/ck=maps.m.HgR1ySVFXik.L.W.O/m=vwr,vd,a,nrw,owc,ob2,sp,en,smi,sc,vlg,smr,as,wrc/am=BoDCIhAB/rt=j/d=1/rs=ACT90oEnDViSjerMr5DSozguPqRfPvO2Xg/ed=1/exm=sc2,per,mo,lp,ti,ds,stx,dwi,enr,pwd,dw,ppl,log,std,b (1550)

This is the url I used:
https://www.google.it/maps/place/Pantheon/@41.8986108,12.4768729,17z/data=!4m18!1m9!3m8!1s0x132f604f678640a9:0xcad165fa2036ce2c!2sPantheon!8m2!3d41.8986108!4d12.4768729!9m1!1b1!16zL20vMDF4emR6!3m7!1s0x132f604f678640a9:0xcad165fa2036ce2c!8m2!3d41.8986108!4d12.4768729!9m1!1b1!16zL20vMDF4emR6?entry=ttu

Any advice would be appreciated.

unexpected keyword argument 'log_level'

/googlemaps-scraper/googlemaps.py", line 314, in __get_driver
input_driver = webdriver.Chrome(executable_path=ChromeDriverManager(log_level=0).install(), options=options)
TypeError: init() got an unexpected keyword argument 'log_level'

most_relevant

Hello, if i set --sort_by most_relevant then some error:
selenium.common.exceptions.ElementNotVisibleException: Message: element not interactable

scrape most relevant reviews

Hi Mattia,

Thansk for sharing your script, it works flawlessly!

I am now trying to re-adapt it to scrape 'most relevant reviews' rather than 'newest' ones.
However, if I change line 66 in googlemaps.py to pick the 'first element' rather than the 'second element', the __scroll function will not go through.

I was wondering whether you faced this difficulty before.

Thanks in advance.

Cheers,
Michele

`__expand_reviews` sometimes not working

The position of self.__scroll() in these lines is incorrect, causing self.__expand_reviews() to be executed before the 'expand more' buttons are loaded. I believe that self.__scroll() was intended to be placed immediately after the comment # scroll to load reviews.

# scroll to load reviews
# wait for other reviews to load (ajax)
time.sleep(4)
self.__scroll()
# expand review text
self.__expand_reviews()

Failed to click recent button

HI, i was trying to execute the code on the default location in urls.txt and I've got the following in the log file with no input on terminal.

2020-02-23 18:34:28,685 - WARNING - Failed to click recent button
2020-02-23 18:34:38,990 - WARNING - Failed to click recent button
2020-02-23 18:34:49,301 - WARNING - Failed to click recent button
2020-02-23 18:34:59,635 - WARNING - Failed to click recent button
2020-02-23 18:35:09,984 - WARNING - Failed to click recent button
2020-02-23 18:35:20,309 - WARNING - Failed to click recent button
2020-02-23 18:35:30,624 - WARNING - Failed to click recent button
2020-02-23 18:35:40,961 - WARNING - Failed to click recent button
2020-02-23 18:35:51,317 - WARNING - Failed to click recent button
2020-02-23 18:36:01,665 - WARNING - Failed to click recent button

Looking at googlemaps.py i found that the issue is within:

   try:
            menu_bt = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'div.cYrDcjyGO77__container')))  # //button[@data-value=\'Sort\'] XPath with graphical interface
            menu_bt.click()
            clicked = True
            time.sleep(3)
     except Exception as e:
            tries += 1
            self.logger.warn('Failed to click recent button')

Could you please explain what is happening and why it isn't working.

emails

hi any way you can add emails and ratings too pls?

Fails on [Review 0]

Returns the following error:

raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"div.siAUzd-neVct.section-scrollbox.cYB2Ge-oHo7ed.cYB2Ge-ti6hGc"}```

can't get original language of reviews.

Hi,
I get reviews on the CSV file but can't get the original text (not in English).
How can I fix it?
Eg:
(Translated by Google) When you come to Vietnam, visiting a beauty salon is a mandatory course (Original) 베트남� 오면 미장� 방문� 필수 코스 정답�네요

__parse_place function error

When running the scraper.py file, print(scraper.get_account(url)) in line 41 and 42 is never executed. I commented out the if statement and found an error under the __parse_place function in line 164 and 165.

Error:

File "googlemaps.py", line 165, in __parse_place

place['overall_rating'] = float(response.find('div', class_='gm2-display-2').text.replace(',', '.')) AttributeError: 'NoneType' object has no attribute 'text'

Tried using the required beautifulsoup, selenium version and tried different versions of chromedriver. That did not solve the issue. What could be the problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.