Coder Social home page Coder Social logo

Some feature questions about php-spider HOT 3 CLOSED

koolma avatar koolma commented on August 19, 2024
Some feature questions

from php-spider.

Comments (3)

danvuquoc avatar danvuquoc commented on August 19, 2024

@koolma, I'm not a contributor or maintainer of this project but I have used this spider.

  1. It doesn't support javascript, this isn't a headless browser. It simply uses guzzle http 6 to grab pages. Typically you then use a css/xpath discovery class to find new uris to find more pages to crawl.
  2. It doesn't follow robots.txt files, but you could easily write a filter that implements the PreFetchFilterInterface and uses tomverran/robots to match the uri path about to be grabbed by the spider against the robots.txt. I've done this as well with robots headers and robots meta tags implementing a PostFetchFilterInterface.

I'm not sure about 3 and 4, hope this helps.

from php-spider.

solverat avatar solverat commented on August 19, 2024
  1. no, not without any additional work
  2. yes, just implement some middleware

from php-spider.

mvdbos avatar mvdbos commented on August 19, 2024

Good answers by @danvuquoc and @solverat. Closing this ancient issue. :-)

from php-spider.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.