Coder Social home page Coder Social logo

weibo's Introduction

weibo

为方便使用合并各种文件到Lweibo.py,并且支持Python3,完全不懂的小白请修改config.ini后打开example.py按需运行,运行抓取任务时请下载完全整个文件,不要就下载一个py文件问为什么出错

关于出现RuntimeError: 20019 repeat content! 问题是新浪最近的小动作,抓取间隔过小会抛错,这个我没办法破

Lweibo.py 提供了API方式和模拟登录的方式,如有问题email我吧

利用python实现对新浪微博的抓取

此爬虫使用了@lxyu 的SDK https://github.com/lxyu/weibo 感谢他之前的工作

2015年08月05日 更新

支持Python3!

TODO

1.模拟登录,并抓取某个页面

2.对页面解析

3.定时任务(已完成,毕业后放出)不放了,换代没啥意思了

4.分布式存储HBase(已完成,毕业后放出)不放了,换代没啥意思了

5.通过API调取活跃用户ID,避免自曾产生僵尸用户数据(已完成,毕业后放出)不放了,换代没啥意思了

3,4,5 三点架构早就变了,觉得大家都可以写写吧,现在可以使用 Elasticsearch 去替代 HBase, 写起来比 HBase 轻松, 前端展示也很方便, 直接用 Kibana 或者 Grafana 做展示,比几年前轻松多了.

weibo's People

Contributors

liuzheng avatar

Stargazers

 avatar Jimmy Sun avatar  avatar Li Jiaxi avatar  avatar Xingru Chen avatar 汪慧 avatar xu avatar LinsenHou avatar Qi LIU avatar  avatar  avatar cwyalpha avatar Yu Pan avatar  avatar HyperSimon avatar Julaiti Alafate avatar Yue Gao avatar  avatar Spencer avatar JaredYeah avatar Steven Zeng avatar Forest avatar Jiaqi Xiang avatar Will Han avatar zhiyue avatar Chen Yiran avatar lisper avatar Fiona Wang  avatar tabshi avatar  avatar Chucongqing avatar KeHao Wu avatar Layoute avatar yongling avatar andy avatar Richard Stanley avatar  avatar zhao mingliang avatar Mario avatar  avatar Julian Wang avatar Jie li avatar  avatar Rock Peng avatar  avatar Xiao SUN avatar kai avatar FLA avatar JiangLiu avatar Xin avatar Chris Chen avatar alanwang avatar jiejie-dev avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar Erdong avatar  avatar Maxiee avatar Yateng Hong avatar  avatar Kai Chen avatar Jevan Luo avatar  avatar saymagic avatar zjc avatar  avatar  avatar  avatar  avatar  avatar

Watchers

DavidChen avatar  avatar Liu Yang avatar ZZ avatar  avatar  avatar html js 测试 avatar  avatar Chris Chen avatar  avatar TaylorHere avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

weibo's Issues

我用模拟方式登陆一直报错

这个是我用模拟方式获取到的req_login数据

    <html>
        <head>
        <title>����ͨ��֤</title>
        <meta http-equiv="refresh" content="0; url=&#39;http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&retcode=4049&reason=%CE%AA%C1%CB%C4%FA%B5%C4%D5%CA%BA%C5%B0%B2%C8%AB%A3%AC%C7%EB%CA%E4%C8%EB%D1%E9%D6%A4%C2%EB&#39;"/>
        <meta http-equiv="Content-Type" content="text/html; charset=GBK" />
        </head>
        <body bgcolor="#ffffff" text="#000000" link="#0000cc" vlink="#551a8b" alink="#ff0000">
        <script type="text/javascript" language="javascript">
        location.replace("http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&retcode=4049&reason=%CE%AA%C1%CB%C4%FA%B5%C4%D5%CA%BA%C5%B0%B2%C8%AB%A3%AC%C7%EB%CA%E4%C8%EB%D1%E9%D6%A4%C2%EB");
        </script>
        </body>
        </html>

这个是报错内容
Traceback (most recent call last):
File "example.py", line 24, in
simuLogin()
File "example.py", line 19, in simuLogin
simu = Lweibo.simu()
File "/home/masong/Coding/weibo/Lweibo.py", line 420, in init
self.simu = weibo_login(username, pwd)
File "/home/masong/Coding/weibo/Lweibo.py", line 163, in init
self.login(username, pwd, cookie_file)
File "/home/masong/Coding/weibo/Lweibo.py", line 221, in login
return self.do_login(username, pwd, cookie_file)
File "/home/masong/Coding/weibo/Lweibo.py", line 288, in do_login
login_url = p.search(text).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

比较奇怪的是,我昨天晚上登陆还一直很正常,今天打开就出错了。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.