Coder Social home page Coder Social logo

caipanwenshu-1's Introduction

caipanwenshu

功能说明

主体功能

  1. new_wenshu.py 中是爬取详情页的主体逻辑,项目启动入口也在这里
  2. wenshu_keyword.py 中是爬取关键词的主要逻辑,可以独立运行

其他功能

  1. wenshu_method.py 中主要是各种公共方法
  2. random_prua.py 中是获取代理ip和UA的方法 (已合并到redis_ip_pool和wenshu_method)
  3. redis_ip_pool.py 中放的是代理ip复用的方法
  4. wenshu_setting.py 配置文件
  5. my_logger.py 是日志模块
  6. .*.js 文件是解密需要运行的javascript代码
  7. docid.py 也是解密js用的代码

数据库表字段说明

表名:wenshu

详情数据存放的表 ['content'] :详情页内容,保留了html格式 ['sid'] :docid,每篇文章的唯一id ['src'] :详情页地址
['category'] :查询关键词,eg: "一级案由:刑事案由" ['title'] :文章标题 ['court'] :法院名称 ['pdate'] :裁判日期,法院判决的日期 ['writ'] :案号 eg: "(2013)行提字第2号" ['reason'] :判决理由 ['sync']:默认为0

表名:wenshu_court1

关键词存放的表 ['type'] : 关键词类型,eg: 一级案由 ['name'] : 关键词,eg: 刑事案由 ['pname'] : 父级名称 ['level'] : 层级,从1开始,下一层加1 ['id'] : 从0开始,自己的id,自增 ['pid'] : 父级id

表名:wenshu_court_id

关键词id存放的表 ['id'] : 与关键词表关联,自增

表名:wenshu_used_keyword

使用过的关键词存放的表 ['keyword'] : 关键词 eg: "一级案由:刑事案由"

表名:wenshu_failed_keyword

失败的关键词存放的表 ['keyword'] : 关键词 eg: "一级案由:刑事案由"

caipanwenshu-1's People

Contributors

monster2848 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.