Coder Social home page Coder Social logo

weibospider's Introduction

WeiboSpider

This is a sina weibo spider built by scrapy

Update 2018/7/28

戳这里:微博爬虫,单机每日千万级的数据 && 吐血整理的微博爬虫总结

Update 2018/7/27

这个爬虫一开始是需要登陆获得微博cookie的,然后再运行爬虫

如果你的账号是买的,微博判定不是正常账号,会出现滑动宫格验证码,本项目中获取cookie的方案就不适用了, 具体可以参考这篇文章

如果需要构建大规模的微博抓取系统,在本项目的基础上仅仅需要做的就是,购买大量微博账号,维护一个账号池

购买微博账号的地址是这里,访问需要翻墙。

目前我自己维护了一个200+个账号的账号池,并通过redis构建分布式,抓取效果如上图,一分钟可以抓取8000左右的数据,一天数据采集量在1100万

这个账号池,我也是花钱买的,就不Share了。

如果确实有抓取数据的需要,可以联系我,Email:[email protected]

使用本项目

Python版本:Python3.6

git clone https://github.com/SimpleBrightMan/WeiboSpider.git
# 首先获取cookie,并存入数据库中
python cookies.py
# 然后运行爬虫
python run.py

weibospider's People

Contributors

nghuyong avatar

Watchers

Ryan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.