Coder Social home page Coder Social logo

imclab / arachnod Goto Github PK

View Code? Open in Web Editor NEW

This project forked from risyasin/arachnod

0.0 2.0 0.0 220 KB

High performance crawler for Nodejs

Home Page: http://risyasin.github.io/arachnod/

License: GNU General Public License v2.0

JavaScript 100.00%

arachnod's Introduction

Arachnod

High performance crawler for Nodejs

Powerful & Easy to use web crawler for Nodejs. Arachnod has been designed for heavy long runing tasks, for performance & efective resource usage. For it's goals Arachnod uses Redis's power as a backend. Covering all heavy & time consuming tasks such as controlling urls & their tasks to store & distribute information among the Arachnod's child tasks (Spiderlings). Arachnod also avoids to use any server-side DOM requiring technics such as jQuery with JSdom to use resources properly. Frankly tested JSdom for along time with no luck, always memory leaks & high memory usage. Libxml based XPath solutions were not actually real. Arachnod uses Cheerio for accessing DOM elements. Also uses SuperAgent as HTTP Client.

How to install
  • via NPM npm install arachnod

  • via Git git clone https://github.com/risyasin/arachnod.git

How to use
    var bot = require('arachnod');
    
    bot.crawl({
        'redis': 'localhost',
        'parallel': 4,
        'start': 'https://www.npmjs.com/package/arachnod',
        'verbose': 1,
        'ignorePaths': ['/list-of-paths/should-be-ignored'],
        'resume': false
    });
    
    bot.on('hit', function (doc, $) {
        /* $ is JQuery like Cheerio object that provides access to DOM*/
        /* doc contains task information & headers of hit. */
        console.log($('#readme p').text());
		console.log($('#readme a').text());
    });
    
    bot.on('error', function (err, task) {
        console.log(['arachnod error:', task]);
    });
    
    bot.on('end', function (err, status) {
        console.log(['arachnod tasks ended:', err, status]);
    });

arachnod's People

Contributors

risyasin avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.