radvieira / img-crawler Goto Github PK

View Code? Open in Web Editor NEW

10.0 3.0 6.0 489 KB

A Nodejs module for downloading images from a website

JavaScript 100.00%

img-crawler's Introduction

img-crawler

A Node module for downloading images to disk from a given URL.

Installation

    npm install img-crawler

Running the tests

From the module directory run:

    npm test

Without npm:

    make test

Usage

Download imgs from 'pearljam.com' and write them to the 'pj-imgs' directory. The dir will be created if not found and resolved to an absolute path.


     var crawler = require('img-crawler');
 var opts = {
     url: 'http://pearljam.com',
     dist: 'pj-imgs'
 };
 
 crawler.crawl(opts, function(err, data) {
     console.log('Downloaded %d from %s', data.imgs.length, opts.url);
 });    
</code>


The callback

Keeping inline with node convention the callback first accepts an error object 
followed by data representing the downloaded images.  The err object will be provided
if loading the web page fails.  Failures are reported in the img responses.

Here's an example of a response:
    {
        imgs: [
            {
                src: 'img/a-img.png', 
                statusCode: 200,
                success: true,
                path: '/Users/radvieira/my-imgs/img/a-img.png'
            },
            {
                src: 'img/another-img.png', 
                statusCode: 404,
                success: false
            }            
        ]
    }
    

In this case the first image was downloaded and written to disk while the other failed.
Notice how there is no path attribute for the failed download.





    
    
  


img-crawler's People




  
    Contributors
    
      
    
  



  
    Stargazers
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
      
    
  



  
    Watchers
    
      
    
      
    
      
    
  



  
    Forkers
    
      jbuiss0n
    
      alextong
    
      santosoide
    
      kclough
    
      wesavetheworld
    
      venky97
    
  



    
    
  




  img-crawler's Issues
  
    

     
      
        StatusCode: 404
      
      Hey
When I use img-crawler I have the     statusCode: 404   error

So I imagine I do something wong.

I pass all test (with mocha), my ftp folder is chmod 777...

So maybe it's a relative distant link problem...
Am on mac...
var crawler = require('img-crawler');

 var opts = {
     url: 'http://vincent-bonnefille.fr/img/perso/',
     dist: '/public/down/'
 };

 crawler.crawl(opts, function(err, data) {
     console.log('Downloaded %d from %s', data.imgs.length, opts.url, data.imgs);
 });

My log (part of)
Downloaded 22 from http://my-web-site.fr/img/perso/ [ { src: '/img/perso/Opaque.jpg', statusCode: 404, success: false },
  { src: '/img/perso/Garden in progress 2015.gif',
    statusCode: 404,
    success: false }
...
...

Did nodejs crawl into the index.php file ? [Edit: Yes]

It's an auto page crawling folder content [Edit: So I add absolute src to my images]
Thank you for your work and time :)
    
  
    

     
      
        Does not properly build directory structure on windows
      
        return binding.mkdir(pathModule._makeLong(path),
                 ^

Error: ENOENT: no such file or directory, mkdir 'C:\Program Files\nodejsC:\static\images'
    at Error (native)
    at Object.fs.mkdirSync (fs.js:916:18)
    at module.exports (C:\Users\Kevin\Projects\brdg-twitter\node_modules\morefs\lib\mkdir-p\index.js:19:7)
    at Object.module.exports.createWriteStream (C:\Users\Kevin\Projects\brdg-twitter\node_modules\morefs\main.js:7:3)
    at createWriteStream (C:\Users\Kevin\Projects\brdg-twitter\node_modules\img-crawler\lib\img\index.js:8:18)
    at Request.<anonymous> (C:\Users\Kevin\Projects\brdg-twitter\node_modules\img-crawler\lib\img\index.js:86:17)
    at emitOne (events.js:96:13)
    at Request.emit (events.js:188:7)
    at ClientRequest.<anonymous> (C:\Users\Kevin\Projects\brdg-twitter\node_modules\img-crawler\node_modules\request\main.js:627:12)
    at ClientRequest.g (events.js:286:16)```

    
  
    

     
      
        Error: Cannot find module 'morefs'. Wrong remote location
      
      On testing this lib, I get this error. Error: Cannot find module 'morefs'
I'm using node v5.0, npm 3.3.
After checking, I see that the remote for more-fs is not correctly linked. It should be https://github.com/mike-melo/morefs.git. I tested it and it worked that way.
Also, why is the package named as public here while you still required it as morefs here.

radvieira / img-crawler Goto Github PK

img-crawler's Introduction

img-crawler

Installation

Running the tests

Usage

The callback

img-crawler's People

Contributors

Stargazers

Watchers

Forkers

img-crawler's Issues

StatusCode: 404

Does not properly build directory structure on windows

Error: Cannot find module 'morefs'. Wrong remote location

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent