Coder Social home page Coder Social logo

edu_lagou's Introduction

概述

拉钩专栏 vip 账号爬取相关专栏

使用方式

  1. 登陆拉钩教育

  2. 复制登陆成功后的 cookies

  3. 爬取:

    3.1 一键订阅:运行crawl/crawl_list.py 订阅并记录需要下载的专栏id到downloads.txt 文件中 3.2 全量爬取:运行crawl/crawl_content.pyspider.crawl_all() 方法 3.3 增量爬取:运行crawl/crawl_content.pyspider.cral_increase()方法
    3.4 转换为 pdf:运行htmltopdf.py

项目说明

  1. 第一次运行使用全量爬取,后续如果拉钩更新,项目会记录未下载和未更新完的专栏。
  2. 增量更新为未更新专栏的更新功能
  3. 目前需要手动在百度云网盘维护 pdf
  4. 增量更新时需要观看日志,并修改转换pdf文件夹,pdf_paths = []根据日志中更新的id,通过查看 https://kaiwu.lagou.com/course/courseInfo.htm?courseId=#{id}并修改更新id到需要更新的文件夹中

项目完成度

  • 爬取拉勾课程
  • 生成pdf
  • 一键获取所有vip专栏订阅
  • 一键下载所有专栏
  • 多线程爬取专栏
  • 全量爬取专栏
  • 增量爬取专栏
  • 更新未更新完得专栏并记录由未更新完变为更新完的专栏

项目运行示例

edu.gif

购买会员链接

购买.image

edu_lagou's People

Contributors

aichibazhang avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.