Coder Social home page Coder Social logo

buyhouse's Introduction

基于python的scrapy爬虫,爬取链家网成都地区新房源,并用高德api在地图上可视化显示

  • 1.效果图如下 image image image image

  • 2.工程里面已经有爬取后的rent.csv文件,可以删除,然后执行命令scrapy crawl fangjia -o rent.csv -t csv生成csv文件

  • 3.爬取完成后,执行命令python -m SimpleHTTPServer 3000,然后打开http://localhost:3000/,点击打开demo.html,导入上面生成的rent.csv文件即可。

buyhouse's People

Contributors

happyte avatar zs511129 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

buyhouse's Issues

运行出错

from fangjia.items import FangjiaItem
ImportError: No module named items

没有items这个module

两个问题:

  1. 项目名和spider名字都为fangjia, 运行时遇到下面异常。通过修改项目名buyhouse/fangjia -> buyhouse/fangjiaCD解决(同时需要修改fangjiaCD/settings.py和buyhouse/scrapy.cfg
    $ scrapy crawl fangjia -o rent.csv -t csv
    Traceback (most recent call last):
    File "/usr/local/bin/scrapy", line 11, in
    sys.exit(execute())
    File "/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py", line 148, in execute
    cmd.crawler_process = CrawlerProcess(settings)
    File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 243, in init
    super(CrawlerProcess, self).init(settings)
    File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 134, in init
    self.spider_loader = _get_spider_loader(settings)
    File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 330, in _get_spider_loader
    return loader_cls.from_settings(settings.frozencopy())
    File "/usr/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 61, in from_settings
    return cls(settings)
    File "/usr/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 25, in init
    self._load_all_spiders()
    File "/usr/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 47, in _load_all_spiders
    for module in walk_modules(name):
    File "/usr/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 71, in walk_modules
    submod = import_module(fullpath)
    File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/init.py", line 37, in import_module
    import(name)
    File "/Users/shidonghua/git-project/buyhouse/fangjia/spiders/fangjia.py", line 3, in
    from fangjia.items import FangjiaItem
    ImportError: No module named items

  2. 解析address异常
    address = response.xpath('//p[@Class="where"]/span/@title').extract()[0]
    IndexError: list index out of range

修改fangjia.py
address = response.xpath('//p[@class="where manager" or @class="where "]/span/@title').extract()[0]

爬虫执行出错

在buyhouse目录下执行:

scrapy crawl fangjia -o rent1.csv -t csv

 结果报错:
File "/usr/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 51, in load
raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: fangjia'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.