Coder Social home page Coder Social logo

spekulatius / spatie-crawler-toolkit-for-laravel Goto Github PK

View Code? Open in Web Editor NEW
18.0 4.0 8.0 24 KB

A toolkit for Spatie's Crawler and Laravel.

Home Page: https://releasecandidate.dev

License: MIT License

PHP 100.00%
spatie-crawler crawler php-scraper php-crawler laravel-crawler laravel

spatie-crawler-toolkit-for-laravel's Introduction

Spatie Crawler Toolkit for Laravel

Laravel 9 should work, but is not extensively tested. Please report any issues you might find!

Software License Total Downloads Awesome PHP crawler

A set of classes to use Spatie's crawler with Laravel. Aim is to simplify building crawler applications or adding a crawler to an existing Laravel project. It can be conveniently integrated into PHP Scraper, for example. At the moment the following helper classes are implemented:

Cache Crawl Queue

The CacheCrawlQueue allows use the pre-configured Cache in Laravel to store the queue. It stores any actions performed on the queue directly to avoid the need to manually store the queue. You can add it directly to your crawler:

Crawler::create()
    ->setCrawlQueue(new \Spekulatius\SpatieCrawlerToolkit\Queues\CacheCrawlQueue($url))
    ->startCrawling($url);

With this you can stop the crawl and restart at any time. This requires a cache-driver being configured in your .env file.

Crawl Logger

The Crawl Logger is an observer you can add to your crawler to enable logging of crawl events:

Crawler::create()
    ->setCrawlObserver(new \Spekulatius\SpatieCrawlerToolkit\Observers\CrawlLogger)
    ->startCrawling($url);

You can export the configuration (see below) to tweak which events are logged.

Crawl Events

The toolkit contains an observer to send you Laravel events allowing you to react to crawl events. This covers the following events:

By default, no events are emitted. To enable events, you will need to add the event observer to your crawler:

$eventObserver = new \Spekulatius\SpatieCrawlerToolkit\Observers\CrawlEvents;

Crawler::create()
    ->setCrawlObserver($eventObserver)
    ->startCrawling($url);

An optional identifier can be passed to the crawl events to distinguish between different crawls:

$eventObserver = new \Spekulatius\SpatieCrawlerToolkit\Observers\CrawlEvents('my-crawl');

Planned functionality

  • Batched crawling using Laravel Queues.

For any suggestions on how to enhance this, please raise an issue.

Requirements & Install

Requirements

  • Laravel 6, 7, 8, 9. Laravel 9 is still in testing. Please report any issues.
  • Cache and Log configured in Laravel.

Installation

composer require spekulatius/spatie-crawler-toolkit-for-laravel

Optionally, you can publish the configuration file:

php artisan vendor:publish --tag=crawler-toolkit-config

Contributing

Please raise a PR or issue.

License

Released under the MIT license. Please see License File for more information.

spatie-crawler-toolkit-for-laravel's People

Contributors

insign avatar spekulatius avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

spatie-crawler-toolkit-for-laravel's Issues

laravel 9 support

Atm there is no laravel 9 support.

Problem 1
- Root composer.json requires spekulatius/spatie-crawler-toolkit-for-laravel ^0.4.0 -> satisfiable by spekulatius/spatie-crawler-toolkit-for-laravel[0.4.0].
- spekulatius/spatie-crawler-toolkit-for-laravel 0.4.0 requires laravel/framework ^6.0|^7.0|^8.0 -> found laravel/framework[v6.0.0, ..., 6.x-dev, v7.0.0, ..., 7.x-dev, v8.0.0, ..., 8.x-dev] but it conflicts with your root composer.json require (^9.2).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.