Coder Social home page Coder Social logo

sogou_wechat_spider's Introduction

sogou_wechat_spider

基于搜狗微信搜索的微信公众号爬虫

项目介绍

本项目基于ThinkPHP5.0.9核心版开发,采用了QueryList采集器。

使用本项目可以根据您自定义的关键字采集公众号信息。

如果对您有帮助,欢迎点 star ;如果有问题,请提 issue .

项目使用

基本环境

  • PHP 5.6+
  • MySql 5.6+
  • Redis 3.2+

基本配置

  • 导入/sql/wechat_data.sql到数据库中,并在wd_task_keywords表中添加需要采集的公众号关键字
  • 配置/application/database.php 为本机的数据库信息

设置代理

因为搜狗会封IP,所以需要设置代理,我用的代理是[阿布云]。 购买后把对应信息填写好,并把以下代码复制到程序curl opt中

   CURLOPT_PROXYTYPE=> CURLPROXY_HTTP,
   CURLOPT_PROXY=> 'PROXY_URL',
   CURLOPT_PROXYAUTH=> CURLAUTH_BASIC,
   CURLOPT_PROXYUSERPWD=> 'PROXY_PASSWORD',

运行

cd 到 public 目录 执行 ./sogou_wechat_spider.sh
若报错,请检查是否赋予执行权限。

方法说明

方法 含义
index/index/sg 根据关键字去搜狗搜索匹配的公众号
index/index/sg_art 根据关键字去搜狗搜索匹配的文章
index/index/autoStart 根据任务量判断是否继续抓取还是休息
index/index/setCookie 手动设置cookie信息,设置了cookie可以抓取10页以上
index/index/count 统计当天总的抓取数量
index/index/keyword_count 统计关键字当天抓取数量

采集效果

1000 个关键词; 不设置cookie; 代理单次并发5个请求。不间断运行,每天可采集约10万公众号。

赞助作者

若项目对您有帮助,欢迎您请我喝杯咖啡。

sogou_wechat_spider's People

Stargazers

 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.