Coder Social home page Coder Social logo

house-renting-spider's Introduction

house-renting-spider

豆瓣小组上海租房爬虫

System Requirements:

To start

# Clone the repo
$ git clone https://github.com/PeggyZWY/house-renting-spider
$ cd house-renting-spider

# Install requirements
$ pip install -r requirements.txt  

# Modify config.ini
$ vim config.ini

config.ini里配置并保存:

  1. key_search_word_list为想要搜索的关键词。如果有多个关键词,请用英文逗号,隔开
  2. custom_black_list为拒绝的关键词黑名单。同样如果有多个关键词,请用英文逗号,隔开
  3. start_time为要搜索在这个时间之后的信息。请用2016-05-01这种格式表示日期
  4. [douban]这个option下的douban_cookiedouban_sleep_time不需要改变。程序里会自动设置cookie;douban_sleep_time设为1秒钟比较合适,防止豆瓣反爬虫封号

比如:
config

配置好之后继续在终端输入:

$ python houseRentingSpider.py  

然后就等爬虫爬呀爬。

结束之后,命令行有提示。比如:
config

根据提示打开此HTML文件后会出现结果。比如(截图仅截取部分结果):
config

配色是根据豆瓣来的嘿嘿:)

Others

houseRentingSpider.py里,现在设置了如下小组。

config

douban_url这个数组里URL的参数中group的值以及douban_url_name数组里的小组名要一一对应。

也就是说,只要你是在豆瓣小组里对关键字进行爬取,在这里设置小组,在config.ini设置关键词,就可以定制出自己的爬虫。

house-renting-spider's People

Contributors

peggyzwy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.