Coder Social home page Coder Social logo

lagou_crawler's Introduction

由于拉勾网近期上线了反爬虫策略,该代码有被拉勾网封IP的可能性。请结合代理使用。

lagou_crawler 拉勾网职位信息爬虫

1. 概述

利用 scrapy 框架对拉勾网上的职位进行抓取,数据存储至 mongodb 中,后续进行进一步分析导出 json 格式数据,利用 fabric + 定时任务上传更新数据至服务器,完成自动部署。

由于 scrapy 目前对 py3 尚未完全支持,因此该项目仅在 py2.7下运行测试通过。

数据展示网站示例:http://107.170.207.236/job_analysis/

数据展示项目地址:https://github.com/namco1992/job_analysis

2. 模块

  1. 爬虫模块
  2. 数据分析,导出为 json 格式数据。
  3. 自动部署

3. 使用方法

首先参照settings.py.example设置 settings.py。

  1. 爬虫
scrapy crawl lagou
  1. 数据分析
python analysis/analyze.py
  1. 自动部署
fab automatic_deploy

4. Powered by

  • scrapy
  • mongodb
  • fabric

5. LICENCE

MIT

lagou_crawler's People

Contributors

namco1992 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.