Coder Social home page Coder Social logo

img-crawler's Introduction

img-crawler

A Node module for downloading images to disk from a given URL.

Installation

    npm install img-crawler

Running the tests

From the module directory run:

    npm test

Without npm:

    make test

Usage

Download imgs from 'pearljam.com' and write them to the 'pj-imgs' directory. The dir will be created if not found and resolved to an absolute path.


     var crawler = require('img-crawler');
 var opts = {
     url: 'http://pearljam.com',
     dist: 'pj-imgs'
 };
 
 crawler.crawl(opts, function(err, data) {
     console.log('Downloaded %d from %s', data.imgs.length, opts.url);
 });    
</code>

The callback

Keeping inline with node convention the callback first accepts an error object followed by data representing the downloaded images. The err object will be provided if loading the web page fails. Failures are reported in the img responses.

Here's an example of a response:

    {
        imgs: [
            {
                src: 'img/a-img.png', 
                statusCode: 200,
                success: true,
                path: '/Users/radvieira/my-imgs/img/a-img.png'
            },
            {
                src: 'img/another-img.png', 
                statusCode: 404,
                success: false
            }            
        ]
    }
    

In this case the first image was downloaded and written to disk while the other failed. Notice how there is no path attribute for the failed download.

img-crawler's People

Contributors

radvieira avatar

Stargazers

Akinjide Bankole avatar So-Young Kim avatar jieming avatar Wonkyung Lyu avatar JT5D avatar evandrix avatar Cory Armbrecht avatar Joe Marcum avatar  avatar Jeroen Herczeg avatar

Watchers

evandrix avatar James Cloos avatar  avatar

img-crawler's Issues

StatusCode: 404

Hey

When I use img-crawler I have the statusCode: 404 error
So I imagine I do something wong.
I pass all test (with mocha), my ftp folder is chmod 777...
So maybe it's a relative distant link problem...

Am on mac...

var crawler = require('img-crawler');

 var opts = {
     url: 'http://vincent-bonnefille.fr/img/perso/',
     dist: '/public/down/'
 };

 crawler.crawl(opts, function(err, data) {
     console.log('Downloaded %d from %s', data.imgs.length, opts.url, data.imgs);
 });

My log (part of)

Downloaded 22 from http://my-web-site.fr/img/perso/ [ { src: '/img/perso/Opaque.jpg', statusCode: 404, success: false },
  { src: '/img/perso/Garden in progress 2015.gif',
    statusCode: 404,
    success: false }
...
...

Did nodejs crawl into the index.php file ? [Edit: Yes]
It's an auto page crawling folder content [Edit: So I add absolute src to my images]

Thank you for your work and time :)

Does not properly build directory structure on windows

  return binding.mkdir(pathModule._makeLong(path),
                 ^

Error: ENOENT: no such file or directory, mkdir 'C:\Program Files\nodejsC:\static\images'
    at Error (native)
    at Object.fs.mkdirSync (fs.js:916:18)
    at module.exports (C:\Users\Kevin\Projects\brdg-twitter\node_modules\morefs\lib\mkdir-p\index.js:19:7)
    at Object.module.exports.createWriteStream (C:\Users\Kevin\Projects\brdg-twitter\node_modules\morefs\main.js:7:3)
    at createWriteStream (C:\Users\Kevin\Projects\brdg-twitter\node_modules\img-crawler\lib\img\index.js:8:18)
    at Request.<anonymous> (C:\Users\Kevin\Projects\brdg-twitter\node_modules\img-crawler\lib\img\index.js:86:17)
    at emitOne (events.js:96:13)
    at Request.emit (events.js:188:7)
    at ClientRequest.<anonymous> (C:\Users\Kevin\Projects\brdg-twitter\node_modules\img-crawler\node_modules\request\main.js:627:12)
    at ClientRequest.g (events.js:286:16)```

Error: Cannot find module 'morefs'. Wrong remote location

On testing this lib, I get this error. Error: Cannot find module 'morefs'

I'm using node v5.0, npm 3.3.

After checking, I see that the remote for more-fs is not correctly linked. It should be https://github.com/mike-melo/morefs.git. I tested it and it worked that way.

Also, why is the package named as public here while you still required it as morefs here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.