Coder Social home page Coder Social logo

lixiang0 / weibospider Goto Github PK

View Code? Open in Web Editor NEW
14.0 14.0 7.0 7.69 MB

微博数据本地持久化,自动下载图片、视频、微博;提供web端显示微博。

Home Page: http://1.14.73.45:18089/

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 0.75% Python 67.83% CSS 3.92% JavaScript 0.89% HTML 26.62%
html ubuntu web weibo weibo-crawler weibo-spider

weibospider's Introduction

悠然微博:微博爬虫、微博本地化部署

【文档不完善,继续补充中。】

20231007更新v0.2.0

  • 更新UI:about中新增每日博文和博主数
  • 完善爬取逻辑:
    • 随机爬取全站每个用户前5页
    • 定时抓取我的关注博主的博文(使用cookie)
    • 定时更新热搜
    • 从评论中抓取全站用户信息,以保证抓取到的是活跃用户
    • 每天抓取最新一次代理IP
    • 每天做一次用户和博文的统计

20221015更新

  • 更新UI
  • 完善爬取逻辑

主要功能:

  • 爬取全站微博
  • 抓取全站博主信息
  • 实时抓取全站热搜
  • 本地化部署微博
  • 关注博主/搜索博主/收藏博文

功能展示

  • 用户主页

  • 个人主页

  • 关注页

  • 博文页

  • 搜索博主

  • 随机博文

todo

  • 完善文档

功能

docker部署

git clone https://github.com/lixiang0/WeiboSpider
cd WeiboSpider/

# 1.minio
docker run \
  -p 9000:9000 \
  -p 9001:9001 \
  --name minio1 \
  -e "MINIO_ROOT_USER=minio" \
  -e "MINIO_ROOT_PASSWORD=minio" \
  -v /mnt/data:/data \
  quay.io/minio/minio server /data --console-address ":9001"

# 2.[可选]关于cookie 
# https://github.com/moonD4rk/HackBrowserData
# cookie保存在results目录下

# 3.部署
# 注意docker-compose.yml里面的mongodb和minio的地址
sudo docker-compose up -d --build


weibospider's People

Contributors

lixiang0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.