Coder Social home page Coder Social logo

cc98's Introduction

前些天有人提出个想法,要八一八心灵,看看都有哪些人在跳。结果这么多天过去了,还没人拿出点东西来,于是我觉得要不新闻我来搞哈哈。 改自 https://github.com/Puhao/ 的cc98爬虫 抓取心灵前n页的帖子,记录到文件中,并用jieba做词频分析,最终结果在cut.txt。与原版相比,不需要mongodb,也没有多线程呵呵呵 主程序go.py

CC98

#爬虫 设定板块的ID号,然后爬虫开始去追踪版面信息,把该板块的每个帖子里,每层楼的发帖者,发帖时间,楼层,发帖内容,改帖子信息存储到MongoDB数据库。

#热词统计 用jieba库统计前20热词

#依赖库

  1. Beautifusoup4
    用来解析HTML页面,定位和提取HTML页面里面所需存储的信息。

    pip install beautifulsoup4
    
  2. lxml
    Beautifulsoup使用的第三方解析器

    pip install lxml
    
  3. jieba
    用于分词
    pip install jieba

cc98's People

Contributors

puhao avatar wanghsinche avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.