Coder Social home page Coder Social logo

sitetool's Introduction

SiteTool

A couple of very simple tools for checking sites and site migration.

Site crawler

Crawls a site to find all links, and then fetches them. Run with:

php cli.php site:crawl http://phpimagick.com/

Reults by default will be written to 'crawl_result.txt'.

Migration checker

Once a site has been crawled, then you can check to see if the same paths are available on a different domain.

php -d memory_limit=1280M src/cli.php site:migratecheck phpimagick.com www.phpimagick.com 

This allows you to check that migrating to a new platform hasn't lost any paths.

Visualizing events

As the whole application is tied together using events, it can be difficult to comprehend how the different parts of the app fit together.

Appending --graph to any of the commands will make the application generate a graph of how the events + processors are tied together for that command, rather than running the command.

The graph generation depends on having graphviz available. There is a docker composer file for this project to allow generating graphs inside that, which can be invoked with something like.

docker-compose up --build

docker exec sitetool_php_1 php cli.php site:crawl http://phpimagick.com/ --graph

If the project is not checked out to a directory named 'sitetool' you may need to run docker ps to find the exact docker image name.

Naming things

Event names

Event names should be a past tense phrase that described what has happened. Examples:

FoundUrl FoundUrlToFetch FoundUrlToSkip
ReceivedHtml ResponseWasOk ResponseWasError ResponseWasReceived

Processor names

Processor names should be of the form 'verb' + 'object' or 'verb' + 'object' + 'condition'. If possible use the event name as the object.

CheckResponseContentTypeIsHtml CheckResponseIsOk FetchUrl LogResponseWasOk LogResponseWasError LogFoundUrlToSkip ParseReceivedHtmlToFindUrls DecideFoundUrlShouldBeFollowed

Where it makes sense, use the event name that is being listened for, in the procesor name.

php phpstan.phar analyze -c ./phpstan.neon -l 7 src

sitetool's People

Contributors

danack avatar danfuture avatar peehaa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sitetool's Issues

Make configuration be simple

Currently the configuration is done as code, such as in this file https://github.com/Danack/SiteTool/blob/master/src/SiteTool/Command/Crawler.php

However that configuration would be much better off in a simple PHP file. The configuration needed to start with is:

  • A list of relays, the set of things that process events, which for the crawler command are:
    FetchUrl::class,
    LogResponseIsOk::class,
    CheckResponseIsOk::class,
    CheckContentTypeIsHtml::class,
    ParseHtmlToFindLinks::class,
    ShouldUrlFoundBeFollowed::class,
    LogSkippedLink::class

  • (optional) the initial domain.

  • (Nice to have) The initial event to be triggered, which in the crawlers case is FoundUrlToFollow

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.