Coder Social home page Coder Social logo

lagou-jobs's Introduction

lagou-jobs

Build Status Go Report Card

采集筛选lagou招聘信息,开箱即用,快速并发

Feature

  • 高并发爬虫实现
  • 实用且简化的规则:城市、关键字等搜索 + 薪资、关键字等筛选
  • 实用且友好的筛选机制
  • 简单易用的命令行工具
  • 更多特性在Roadmap

Install

go get -u github.com/WindomZ/lagou-jobs

Usage

Config

可以基于下列注释信息配置项目中提供的空白config.json文件:

{
  "userAgent": "",       // User Agent
  "requestTimeout": 15,  // 请求超时时间
  "requestInterval": 0,  // 请求间隔时间
  "search": {
    "city": "",          // 搜索城市,必填
    "keywords": [        // 搜索关键字
      ""                 // 必填
    ],
    "company": {
      "excludeID": [     // 排除公司ID,可以基于上次搜索结果设置
        0                // 可为0
      ]
    },
    "position": {
      "excludeID": [     // 排除职位ID,可以基于上次搜索结果设置
        0                // 可为0
      ],
      "filter": {        // 职位信息过滤器
        "include": [     // 必须包含的关键字,为空则默认全通过
          ""             // 可为空
        ],
        "exclude": [     // 必须排除的关键字,可为空
          ""             // 可为空
        ]
      },
      "salary": {        // 薪资筛选
        "min": 0,        // 最低薪资,为0则不进行筛选
        "max": 0         // 最高薪资,为0则基于最低薪资筛选
      }
    }
  },
  "output": {            // 输出格式,选一个填写
    "files": {           // 输出文件
      "json": ""         // 输出JSON文件
    },
    "http": {            // 输出HTML(暂未实现)
      "port": 0          // 网页端口(暂未实现)
    }
  }
}

Execute

项目路径下,在终端运行下列命令:

lagou-jobs config.json

将会得到配置中"output"的输出信息。

Contributing

欢迎PRs(pull requests),或在issues page报告错误,提出建议和讨论。

如果你对此项目感兴趣,欢迎点击上面 ⭐Star 予以支持。

Roadmap

  • 爬虫业务整体框架
  • 高并发请求及筛选
  • 请求间隔调控设定
  • 预防官方屏蔽机制
  • 归类公司、职位信息
  • 更友好的薪资筛选(一个或者范围内薪资)
  • 学历筛选(很多都是本科,且有筛选机制支持,待定)
  • 输出JSON格式
  • 输出HTML格式,网页可视化

License

MIT

lagou-jobs's People

Contributors

windomz avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.