Coder Social home page Coder Social logo

loongago / weibo_spider Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wangshunping/weibo_spider

0.0 2.0 0.0 423 KB

graduate project, a weibo spider to find some interesting information such as "In social network , people tend to be happy or sad."

Python 100.00%

weibo_spider's Introduction

weibo_spider

graduate project, a weibo spider to find some interesting information such as "In social network , people tend to be happy or sad."

还是不用英语了。

Hi,我今年大四毕业,我猜你也是和我一个差不多大的少年。(如果是姑娘请联系我谢谢)

我不知道你是通过哪种渠道访问到了我的项目。如果你只是刚入门python或者根本不会,来吧,我们来撸一发,我们来玩些有趣的。

python 是一门非常简洁的语言,最重要的是,python的源码是可见的,它是真正的开源项目应该崇尚的语言。我在这个项目里面用到的爬虫,操作数据库,多线程,画图,统计或机器学习都是用python语言实现的。

如果你用python,你就不是一个人在战斗。

这个项目现在已经做到的程度是,下载源码,改动 config.ini 文件里面的学校名称,入口id, cookie,就可以画出你们学校的微博使用统计图统计图,我希望你把最后得到的图push给我,我希望帮助,或者和你一起完成前面的步骤(请先star我,我需要你的支持);我希望通过我的源码,让你了解或者熟悉python,然后你再告诉我你有了一个什么碉堡了的想法,我来和你一起弄。

毕竟,

Talk is cheap, show me your code.

东西还在持续更新中,欢迎fork 和 star。

这是我的项目介绍的地址: http://wangshunping.github.io/%E9%A1%B9%E7%9B%AE/graduation-project/

联系

[email protected]

用法

你需要安装的

  1. python2.7 (是不是转到py3比较有big)
  2. BeautifulSoup (超文本解析器)
  3. requests (http for human)
  4. mysql (...)
  5. matplotlib (绘图库 apt-get install python-matplotlib)

运行

  1. 好啦,首先安装各种(安装遇到了问题可以邮件我)

  2. 修改config.ini ,主要是连接数据库的参数,cookie(如何获取cookie,可以参考这里http://www.douban.com/note/265628450),你要爬的学校和入口的用户id

  3. 运行init.sql,建立数据库。

  4. 运行 python util.py 来测试爬虫是否可以成功爬取,返回入口id的信息,并在数据库插入第一条信息 如果成功了,会返回入口id的个人信息 image

  5. 运行 python NameListFactory.py 就可以欢乐的爬啦。

注意

  1. 目前的线程为3,爬久了还是会封号。不过过几个小时会解封,解决策略是多申请几个号。

如果账号被暂时冻结, 那么会变成这个死样子...

image

Update 2015.03.27

这是基于安医大近500个爬取的微博账号统计的结果。 image image image

weibo_spider's People

Contributors

wangshunping avatar skyleft avatar xczswt1993 avatar

Watchers

James Cloos avatar Loong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.