job-offers-spider's Introduction

Job Offer Spider

job-offer-spider是一个爬虫学习项目，主要采用requests发起请求和pyquery解析页面。

本项目参考崔庆才的《Python3网络爬虫开发实战》一书编写。

项目设计上分为两个子模块，分别是代理池模块pool和爬虫模块spider。

代理池模块

代理池模块主要用作代理节点信息的抓取、测试和API接口服务，由schedule调度。

getter.py负责从第三方服务抓取代理节点信息，其调用的Crawler类下所有以‘crawl_’开头的方法会被顺序执行。
db.py负责将抓取到的代理节点信息存入Redis
tester.py负责测试存在Redis中的代理节点，多次测试失败时会删除失败的节点信息
api.py负责暴露API服务给爬虫模块，随机获取Redis中分数最高的节点信息返回

爬虫模块

爬虫模块主要抓取并解析BOSS直聘网站的招聘岗位信息，由schedule调度。

myproxy.py从代理池API服务中获取代理节点
getter.py执行请求
parser.py解析请求

运行使用

需要先更改代理池模块下crawler.py中的代理抓取方法为可用方法，并在环境中配置好Redis和MySQL数据库。

然后运行代理池模块的scheduler脚本，等待几分钟，使代理池采集足够的代理节点。

再运行爬虫模块的schedule脚本，等待爬虫运行完毕即可。

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

maomao622 / job-offers-spider Goto Github PK