Coder Social home page Coder Social logo

querylist's Introduction

QueryList简介


QueryList是一个基于phpQuery的通用列表采集类,是一个简单、 灵活、强大的采集工具,采集任何复杂的页面 基本上就一句话就能搞定了。

QueryList 安装

通过composer安装:

composer require jaeger/querylist

更多安装方法:QueryList多种安装方式

QueryList 使用

下面演示QueryList用一句代码采集百度搜索结果:

//获取采集对象
$hj = QueryList::Query('http://www.baidu.com/s?wd=QueryList',array(
        'title'=>array('h3','text'),
        'link'=>array('h3>a','href')
    ));
//输出结果:二维关联数组
print_r($hj->data);

上面的代码实现的功能是采集百度搜索结果页面的所有搜索结果的标题链接,然后分别以二维关联数组的格式输出。

采集结果:

Array
(
    [0] => Array
        (
            [title] => QueryList|基于phpQuery的无比强大的PHP采集工具
            [link] => http://www.baidu.com/link?url=IIsMhpzI2PylnmW8vPALcwIfJgHhKFu2SWXEj7yQ-6o7KStbLfmuoWGmalpx1xYE
        )

    [1] => Array
        (
            [title] => 介绍- QueryList指导文档
            [link] => http://www.baidu.com/link?url=edktLqt6f9KwYJ6oip1EDXvwIXh-nHcFImVJeqRm56-VU3zIcqLRYeM83VyYQE_X
        )

  //省略....

)

Query() 静态方法

返回值:QueryList对象

Query方法为QueryList唯一的主方法,用静态的方式调用。

原型:

QueryList::Query($page,array $rules, $range = '', $outputEncoding = null, $inputEncoding = null,$removeHead = false)

中文解释:

QueryList::Query(采集的目标页面,采集规则[,区域选择器][,输出编码][,输入编码][,是否移除头部])
//采集规则
$rules = array(
   '规则名' => array('jQuery选择器','要采集的属性'[,"标签过滤列表"][,"回调函数"]),
   '规则名2' => array('jQuery选择器','要采集的属性'[,"标签过滤列表"][,"回调函数"]),
    ..........
    [,"callback"=>"全局回调函数"]
);

//注:方括号括起来的参数可选

参数解释:

查看文档:http://doc.querylist.cc/site/index/doc/11

QueryList 扩展

Request 网络操作扩展

可以实现如携带cookie、伪造来路等任意复杂的网络请求,再也不用担心QueryList内置的抓取功能太弱了。

Login 模拟登陆扩展

可以实现模拟登陆然后采集。

Multi 多线程插件

多线程(多进程)采集扩展。

DImage图片下载扩展

可实现简单的图片下载需求。

扩展安装以及使用教程:QueryList扩展文档,获取更多扩展可以关注QueryList社区和交流群。

其它说明

1.QueryList内置的只是简单的源码抓取方法,遇到更复杂的抓取情况,如:需要登陆 身份验证 时,请配合其它的PHP的HTTP工具(推荐使用Guzzle)来使用,通过将辅助的HTTP类抓取到的网页源码传给QueryList即可。

2.采集程序请在PHP命令行模式(PHP CLI)下运行。

3.QueryList依赖phpQuery,phpQuery项目主页:phpQuery文档

寻求帮助?

Author

Jaeger [email protected]

Lisence

QueryList is licensed under the license of MIT. See the LICENSE for more details.

querylist's People

Contributors

bryant1410 avatar jae-jae avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.