Coder Social home page Coder Social logo

node-car-examples's Introduction

node-car-examples

Included are a few simple examples of using the node-car-scraper wrapper.

Installation

Install Node

Follow the instructions on the Node website.

Clone this repo

git clone https://www.github.com/JTarasovic/node-car-examples MyFolder

Install Dependencies

cd MyFolder && npm install

Running

Simple version

node examples/simple.js

This version just passes a site (reddit.com) to node-car-scraper which fetches the site and returns a cheerio object.

processSite('http://www.reddit.com', oneb, two, three, four);
function oneb (err, $, cb) {
	var arr = ['https://developer.mozilla.org', 'http://nodejs.org']; 
	return cb(arr);
}

The returned object is ignored and an array with two different sites are passed back to the callback. node-car-scraper gets both those pages and calls the link callback which parses all of the absolutely links off of the pages and sends it back to node-car-scraper which requests each of those links.

function two (err, $, cb) {
	var arr = [];
	$('a').each(function (i, elem) {
		temp = $(elem).attr('href');
		if (temp.startsWith('http') && !temp.endsWith('tar.gz')) {
			arr.push(temp);
		}
	});
	cb(arr);
	return;
}

The results of those requests are, again, ignored for simplicity; however, the callback is called.

function three (err,$,cb) {
	return(cb());
}

After all of the links are processed, the final callback is called.

function four () {
	console.log('HOLY SHIT!!');
}
Slightly better version

node index.js

This version uses a class that contains the necessary methods and properties to query a car dealership, get all of the search results and store them in MongoDB.

See index.js and bmw.js for more details.

Viewing the results webpage

node server.js

This is very much a quick and dirty web server that returns the results in a nice table if you navigate to localhost:3000. I'd like to migrate this to Handlebars or something similar instead of fiddling with the HTML directly but it works for the short term.

Adapting these examples

There are a couple of options:

Export a class that contains the necessary callback functions.

See bmw.js for more details on this method.

This is probably the preferred method as you can create a separate class for each dealership/website that you would like to query.

var processSite = require('node-car-scraper');
var Car = require('MyCarClass');
var car = new Car();

processSite(car.url, car.siteCallback, car.pageCallback, car.linkCallback, car.finalCallback);
Use anonymous functions.
var processSite = require('node-car-scraper');

processSite('http://www.example.com',
    function(err, $, callback){
        // ... send back an array of pages to visit based on the first URL visited
        callback(arr);
    },
    function(err, $, callback){
        // ... send back an array of links for individual pages to query
        callback(arr);
    },
    function(err, $, callback){
       // ... given one page with the necessary details, parse out what it relevent to you.
       // ... this is where you'd add an entry into a DB if desired
       // ... call the callback with no args
       callback();
    },
    function(){
       // ... final callback. Only called once when all of the links have been processed.
       console.log('WOOT! Finished!');
    });
Pass named functions as callbacks.
processSite(url, getPagesFromSite, getLinksFromPage, getDetailsFromIndividualPage, finished);

function getPagesFromSite (err, $, callback){
    // ... send back an array of pages to visit based on the first URL visited
    callback(arr);
}

function getLinksFromPage (err, $, callback){
    // ... send back an array of links for individual pages to query
    callback(arr);
},

function getDetailsFromIndividualPage (err, $, callback){
    // ... given one page with the necessary details, parse out what it relevent to you.
    // ... this is where you'd add an entry into a DB if desired
    // ... call the callback with no args
    callback();
},

function finished (){
    // ... final callback. Only called once when all of the links have been processed.
    console.log('WOOT! Finished!');
};

Debugging

NODE_ENV=debug node fileToRun.js

Using the debug flag causes node-car-scrapper to output quite a lot of information about what it is doing to stdout. As such, it may be helpful to redirect fd 1 (stdout) to a file.

NODE_ENV=debug node fileToRun.js > debug.log

node-car-examples's People

Contributors

jtarasovic avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.