Coder Social home page Coder Social logo

e-commerce-crawlers's Introduction

🚀电子商务网站爬虫合集

1、jd_phone

  • 项目简介:京东搜索全平台所有手机参数信息
  • 作用:这个不仅仅针对手机,其实可以扩展为京东所有类型的商品的信息爬取
  • 主要库: selenium lxml requests json re
  • 信息效果:

2、天猫品牌搜索

  • 项目简介:获取天猫品牌搜索中某个关键词的全部店铺信息(所有店铺名称、链接、相关商品数量、总商品数量等)
  • 作用:可以很直观的得知某个关键词(主要是品牌)所包含的商品信息,包括天猫店铺和店铺中相关商品数量等。这个爬虫获取的数据,对于想要在天猫开店的商家有重大参考意义。
  • 主要库: selenium
  • 信息效果:

3、天猫商品评价标签

  • 项目简介:批量获取天猫单个商品的评价标签关键词
  • 作用:可以从商品标签词中统计出每个商品在买家评论中的优点和缺点,可以帮助商家快速的整改评论不好的商品,提升商品DSR。
  • 主要库: requests
  • 信息效果:

4、模拟登陆淘宝

  • 项目简介:使用账号密码模拟登陆淘宝
  • 作用:登陆了淘宝就可以进一步获取更多信息
  • 主要库: selenium

5、天猫店铺全店商品(手机端)信息提取爬虫

  • 项目简介:爬取指定天猫店铺手机端全店商品信息,包括商品ID、价格、月销量、总销量、标题、链接、主图链接等
  • 不过经过对比页面,发现销售信息有点不符合页面展示的数据,这个具体原由不知道是天猫特意给的错误信息来防止爬虫还是本身的信息是有缓存延迟展现的
  • 主要库: requests json csv
  • 信息效果:

6、天猫店铺全店商品scrapy版

  • 项目简介:爬取手机天猫某个店铺全部商品的基本信息,scrapy 爬虫
  • 主要库: scrapy

e-commerce-crawlers's People

Contributors

hopetree avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

e-commerce-crawlers's Issues

合作洽谈

可以求个联系方式私聊吗,可以讨论下应用

启动报错 FileNotFoundError: [Errno 2] No such file or directory: 'phantomjs': 'phantomjs'

系统:osx 10.13.2
chrome版本:63
Firefox版本:58
python版本:3.6.3
pip版本:9.0.1
/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 76, in start
stdin=PIPE)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 709, in init
restore_signals, start_new_session)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'phantomjs': 'phantomjs'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./crawler.py", line 200, in
jd = JDPhone(1000)
File "./crawler.py", line 36, in init
self.driver = webdriver.PhantomJS()
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/phantomjs/webdriver.py", line 56, in init
self.service.start()
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 83, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'phantomjs' executable needs to be in PATH.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.