Comments (29)
@dvijparekh1995 However we can take the class name and map it from there. But this will break when they change it again.
from justdial-scrapper.
There is a series used by JD to show phone number.
If we can extract the span>classname then we can get mobile numbers easily.
Series is as below
Number - span class="icon-XX"
1 - icon-yz
2 -icon-wx
3 -icon-vu
4 -icon-ts
5 -icon-rq
6 -icon-po
7 -icon-nm
8 -icon-lk
9 -icon-ji
0 -icon-acb
from justdial-scrapper.
Thanks @Alankar0416 for sharing the code.
Here is an array mapping I've used as a second pass on the csv file. I used the .find_all for phone number.
- '<bound method Tag.find_all of ' => '',
- '>' => '',
- '<span class=""mobilesv icon-dc"">' => '',
- '<span class=""mobilesv icon-fe"">' => '',
- '<span class=""mobilesv icon-hg"">' => '',
- '<span class=""mobilesv icon-ba"">' => '-',
- '<span class=""mobilesv icon-acb"">' => '0',
- '<span class=""mobilesv icon-yz"">' => '1',
- '<span class=""mobilesv icon-wx"">' => '2',
- '<span class=""mobilesv icon-vu"">' => '3',
- '<span class=""mobilesv icon-ts"">' => '4',
- '<span class=""mobilesv icon-rq"">' => '5',
- '<span class=""mobilesv icon-po"">' => '6',
- '<span class=""mobilesv icon-nm"">' => '7',
- '<span class=""mobilesv icon-lk"">' => '8',
- '<span class=""mobilesv icon-ji"">' => '9',
- '<bound method Tag.find_all of ' => '',
- '>' => '',
Attached is my php code.
clean_csv.php.txt
from justdial-scrapper.
I am getting urllib open timeout error. Is this code still working for anyone?
from justdial-scrapper.
I was able to earlier, but it seems they have started sending svg image instead of numbers.
from justdial-scrapper.
Yes, I had that in mind. But the issue is they can change the class name whenever they want and this will break then. Better to think of something concrete. The most foolproof solution is to use digit recognition on the image.
from justdial-scrapper.
yes i think the same. as the will surely change it.
from justdial-scrapper.
I'm not getting the phonenumbers. Can you tell me how to get phone numbers
from justdial-scrapper.
@krishnamalireddy JD is now using svg's in place of actual numbers. That's why parsing is getting failed. There are couple of ways to get around this.
Each svg's has a unique code which can be mapped - will fail if they change mapping again
Use a digit recognition over the svg.
Unfortunately I am not getting time to develop this. Will pick it up whenever I have some bandwidth.
from justdial-scrapper.
@Alankar0416 Could you please demonstrate, how can we implement the numbers from svgs in code?
from justdial-scrapper.
@Alankar0416 Could you please demonstrate, how can we implement the numbers from svgs in code?
simple solution is instead of using .string use .find_all for phone number.
You will get random code of svg's convert them
from justdial-scrapper.
The issue is we can to keep a map of svg code and number but it JD can change it anytime.
from justdial-scrapper.
Ha they can change it any time. If they have changed we have to decode it again. By the way they haven't changed it for a long time
from justdial-scrapper.
Great work @ketanshah79
Haven't tried this code. Are you able to successfully map phone numbers with this additional script? If yes, I can add this into the original script to make things easy for everyone.
from justdial-scrapper.
from justdial-scrapper.
only 10 data retrieving
from justdial-scrapper.
@Alankar0416 could you please post the code along with @ketanshah79 's changes?
Need to get justdial data for a college project.
Please guys, if either of you could do it, it'll be really helpful
Thanks!
from justdial-scrapper.
@Alankar0416 could you please post the code along with @ketanshah79 's changes?
Need to get justdial data for a college project.
Please guys, if either of you could do it, it'll be really helpfulThanks!
@mps1305 check my forked repo i have made changes accordingly and its working just change url whichever you want
from justdial-scrapper.
hey @dvijparekh , it was working up until sometime back. then started getting this error. Any help in this regard would be highly appreciated!
"[WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond"
from justdial-scrapper.
it seems like justdial is blocking scraper to scrape working on it
from justdial-scrapper.
Hey, I have written a script that will scrape phone numbers from any JustDail Business page.
It uses the info in CSS stylesheet to create a mapping between the strings assigned to each number.
The mapping is done every time you load a page, therefore it works for every business.
Please try this:
https://github.com/SuhailSaify/Justdial-Scrapper
PS: it also scrapes other info along with Phone numbers.
(Working on July, 2019)
from justdial-scrapper.
can anyone update latest code here?
from justdial-scrapper.
from justdial-scrapper.
I am about to solve this issue, can anyone help me with this error - https://stackoverflow.com/questions/60875316/typeerror-string-indices-must-be-integers-when-getting-class-fro-span-tag-using
from justdial-scrapper.
I am about to solve this issue, can anyone help me with this error - https://stackoverflow.com/questions/60875316/typeerror-string-indices-must-be-integers-when-getting-class-fro-span-tag-using
please share link url of just dial you are trying to scrape
from justdial-scrapper.
I am about to solve this issue, can anyone help me with this error - https://stackoverflow.com/questions/60875316/typeerror-string-indices-must-be-integers-when-getting-class-fro-span-tag-using
please share link url of just dial you are trying to scrape
Solved it brother. Thank you.
from justdial-scrapper.
There is another error though
AttributeError: 'NoneType' object has no attribute 'text'
on line return body.find('span', {'class':'mrehover'}).text.strip()
in get_address
from justdial-scrapper.
There is another error though
AttributeError: 'NoneType' object has no attribute 'text'
on linereturn body.find('span', {'class':'mrehover'}).text.strip()
inget_address
it means it is not able to find span tag having class mrehover so body.find is returning none which doesnt have any method or attribute text()
try below code and let me know what are you getting from it
tesVar = body.find('span', {'class':'mrehover'})
print(`tesVar)
from justdial-scrapper.
Hey, use this method https://youtu.be/EkbF5JwuHqU
from justdial-scrapper.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from justdial-scrapper.