Coder Social home page Coder Social logo

node-horseman's Introduction

Horseman

Horseman lets you run PhantomJS from Node.

Horseman has:

  • a simple, chainable API, like jQuery,
  • an easy-to-use control flow (see the examples),
  • support for multiple tabs open at the same time.

Additionally, Horseman loads jQuery onto each page by default, which means you can use it inside your evaluate and manipulate functions automatically.

Installation

  1. Install Node, if you haven't already:

http://nodejs.org/

  1. Install PhantomJS:

http://phantomjs.org/download.html

Either the 1.x or 2.x versions are fine, but be aware that PhantomJS has a bug in the 2.x line that prevents file uploads.

  1. NPM install Horseman:

npm install node-horseman

Example

Search on Google:

var Horseman = require('node-horseman');
var horseman = new Horseman();

var numLinks = horseman
  .open('http://www.google.com')
  .type('input[name="q"]', 'github')
  .click("button:contains('Google Search')")
  .waitForNextPage()
  .count("li.g");

console.log("Number of links: " + numLinks);

horseman.close();

For longer examples, check out the Examples folder.

API

new Horseman(options)

Create a new instance that can navigate around the web.

The available options are:

  • clientScripts an array of local javascript files to load onto each page.
  • timeout: how long to wait for page loads or wait periods, default 5000 ms.
  • interval: how frequently to poll for page load state, default 50 ms.
  • port: port to mount the phantomjs instance to, default 12401.
  • weak: set dnode weak option to false to fix cpp compilation for windows users, default true.
  • loadImages: load all inlined images, default true.
  • ignoreSSLErrors: ignores SSL errors, such as expired or self-signed certificate errors, default true.
  • sslProtocol: sets the SSL protocol for secure connections [sslv3|sslv2|tlsv1|any], default any.
  • webSecurity: enables web security and forbids cross-domain XHR, default true.
  • injectJquery: whether or not jQuery is automatically loaded into each page. Default is true. If jQuery is already present on the page, it is not injected.
  • proxy: specify the proxy server to use address:port, default not set.
  • proxyType: specify the proxy server type [http|socks5|none], default not set.
  • proxyAuth: specify the auth information for the proxy user:pass, default not set.

Cleanup

Be sure to .close() each Horseman instance when you're done with it!.

####.close() Closes the Horseman instance by shutting down PhantomJS.

Navigation

.open(url)

Load the page at url.

.back()

Go back to the previous page.

.forward()

Go forward to the next page.

.reload()

Refresh the current page.

.cookies([object|array of objects])

Without any options, this function will return all the cookies inside the browser.

var cookies = horseman
  .open('http://httpbin.org/cookies')
  .cookies();

console.log( cookies ); // []

You can pass in a cookie object to add to the cookie jar.

var cookies = horseman
  .cookies({
    name : "test",
    value : "cookie",
    domain: 'google.org'
  })
  .open('http://httpbin.org/cookies')
  .cookies();

console.log( cookies ); 
/*
[ { domain: '.httpbin.org',
    httponly: false,
    name: 'test',
    path: '/',
    secure: false,
    value: 'cookie' } ]
*/

You can pass in an array of cookie objects to reset all the cookies in the cookie jar (or pass an empty array to remove all cookies).

var cookies = horseman
  .cookies([
  {
    name : "test2",
    value : "cookie2",
    domain: 'httpbin.org'
  },
  {
    name : "test3",
    value : "cookie3",
    domain: 'httpbin.org'
  }])
  .open('http://httpbin.org/cookies')
  .cookies();

console.log( cookies.length ); // 2

.userAgent(userAgent)

Set the userAgent used by PhantomJS. You have to set the userAgent before calling .open().

.headers(headers)

Set the headers used when requesting a page. The headers are a javascript object. You have to set the headers before calling .open().

.authentication(user, password)

Set the user and password for accessing a web page using basic authentication. Be sure to set it before calling .open(url).

new Horseman()
  .authentication('myUserName','myPassword')
  .open('http://www.mysecuresite.com');

.viewport(width, height)

Set the width and height of the viewport, useful for screenshotting. You have to set the viewport before calling .open().

.scrollTo(top, left)

Scroll to a position on the page, relative to the top left corner of the document.

.zoom(zoomFactor)

Set the amount of zoom on a page. The default zoomFactor is 1. To zoom to 200%, use a zoomFactor of 2. Combine this with viewport to produce high DPI screenshots.

horseman
  .viewport(3200,1800)
  .zoom(2)
  .open('http://www.horsemanjs.org')
  .screenshot('big.png')

Tabs

Horseman lets you open multiple tabs, just like you probably do in a real browser. Also, any anchors elements with a target will open in a new tab.

Whenever a new tab is opened, either programatically or because of an action on the page (like window.open), a tabCreated event will fire.

.tabCount()

Get the number of open tabs.

.switchToTab( tabNumber ){

Switch to the desired tabNumber. The first tab is number 0.

.openTab( [url] ){

Opens a new tab. Optionally, pass in a url and the new tab will automatically go that url.

Evaluation

Evaluation elements return information from the page, and end the Horseman API chain.

.title()

Get the title of the current page.

.url()

Get the url of the current page.

.visible(selector)

Determines if a selector is visible, or not, on the page. Returns a boolean.

.exists(selector)

Determines if the selector exists, or not, on the page. Returns a boolean.

.count(selector)

Counts the number of selector on the page. Returns a number.

.html([selector])

Gets the html inside of an element. If no selector is provided, it returns the html of the entire page.

.text(selector)

Gets the text inside of an element.

.value(selector, [val])

Get, or set, the value of an element.

.attribute(selector, attribute)

Gets an attribute of an element.

.cssProperty(selector, property)

Gets a CSS property of an element.

.width(selector)

Gets the width of an element.

.height(selector)

Gets the height of an element.

.screenshot(path)

Saves a screenshot of the current page to the specified path. Useful for debugging.

.screenshotBase64(type)

Returns a base64 encoded string representing the screenshot. Type must be one of 'PNG', 'GIF', or 'JPEG'.

.pdf(path, [paperSize])

Renders the page as a PDF. The default paperSize is US Letter.

The paperSize object should be in either this format:

{
  width: '200px',
  height: '300px',
  margin: '0px'
}

or this format

{
  format: 'A4',
  orientation: 'portrait',
  margin: '1cm'
}

Supported formats are: A3, A4, A5, Legal, Letter, Tabloid.

Orientation (portrait, landscape) is optional and defaults to 'portrait'.

Supported dimension units are: 'mm', 'cm', 'in', 'px'. No unit means 'px'.

.crop(selector | boundingRectangle, path)

Takes a screenshot of a portion of the page. You can pass in either a CSS selector or a boundingRectangle { top : 50, left: 200, width: 90, height: 200 }.

var horseman = new Horseman();

horseman  
  .open("http://www.yahoo.com")
  .crop(".logo-container", "yahoologo.png");

horseman.close();

####.evaluate(fn, [arg1, arg2,...]) Invokes fn on the page with args. On completion it returns a value. Useful for extracting information from the page.

var size = horseman
  .open("http://en.wikipedia.org/wiki/Headless_Horseman")
  .evaluate( function(selector){
    // This code is executed inside the browser.
    // It's sandboxed from Node, and has no access to anything
    // in Node scope, unless you pass it in, like we did with "selector".
    //
    // You do have access to jQuery, via $, automatically.
    return {
      height : $( selector ).height(),
      width : $( selector ).width()
    }
  }, ".thumbimage");

console.log( size );
horseman.close();

Manipulation

These functions change the page, and can be chained consecutively.

.manipulate(fn, [arg1, arg2,...])

Works the same as .evaluate(), but doesn't return a value, so it can be used without interrupting the Horseman API chain.

var count = horseman
  .open("http://en.wikipedia.org/wiki/Headless_Horseman")
  .count( selector ); //.count() ends the API chain

console.log( count ); // -> 2

count = horseman
  .manipulate( function( selector){
    $(selector).each( function(){
      $(this).remove();
    })
  }, selector)
  .count( selector );

console.log( count );// -> 0

.click(selector)

Clicks the selector element once.

.select(selector, value)

Sets the value of a select element to value.

.clear(selector)

Sets the value of an element to "".

.type(selector, text [,options])

Enters the text provided into the selector element. Options is an object containing eventType (keypress, keyup, keydown. Default is keypress) and modifiers, which is a string in the formation of ctrl+shift+alt.

.upload(selector, path)

Specify the path to upload into a file input selector element.

.injectJs(file)

Inject a javascript file onto the page.

Waiting

These functions for the browser to wait for an event to occur. If the event does not occur before the timeout period (configurable via the options), a timeout event will fire.

.wait(ms)

Wait for ms milliseconds e.g. .wait(5000)

.waitForNextPage()

Wait until a page finishes loading, typically after a .click().

.waitForSelector(selector)

Wait until the element selector is present e.g. .wait('#pay-button')

.waitFor(fn, value)

Wait until the fn evaluated on the page returns value.

Events

.on(event, callback)

Respond to page events with the callback. Be sure to set these before calling .open().

Supported events are:

  • initialized - callback()
  • loadStarted - callback()
  • loadFinished - callback(status)
  • tabCreated - callback()
  • urlChanged - callback(targetUrl)
  • navigationRequested - callback(url, type, willNavigate, main)
  • resourceRequested - callback(requestData, networkRequest)
  • resourceReceived - callback(response)
  • consoleMessage - callback(msg, lineNumber, sourceId)
  • alert - callback(msg)
  • confirm - callback(msg)
  • prompt - callback(msg, defaultValue)
  • error - callback(msg, trace)
  • timeout - callback(msg) - Fired when a wait timeout period elapses.

For a more in depth description, see the full callbacks list for phantomjs.

horseman
  .on('consoleMessage', function( msg ){
    console.log(msg);
  })

Debug

To run the same file with debugging output, run it like this DEBUG=horseman node myfile.js.

This will print out some additional information about what's going on:

horseman .setup() creating phantom instance on port 12406 +0ms
horseman load finished, injecting jquery and client scripts +401ms
horseman injected jQuery +0ms
horseman .open: http://www.google.com +66ms
horseman .type() horseman into input[name='q'] +51ms

Tests

Automated tests for Horseman itself are run using Mocha and Should, both of which will be installed via npm install. To run Horseman's tests, just do npm test.

When the tests are done, you'll see something like this:

npm test
  85 passing (37s)
  2 pending

License (MIT)

WWWWWW||WWWWWW
 W W W||W W W
      ||
    ( OO )__________
     /  |           \
    /o o|    MIT     \
    \___/||_||__||_|| *
         || ||  || ||
        _||_|| _||_||
       (__|__|(__|__|

Copyright (c) John Titus [email protected]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the 'Software'), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

node-horseman's People

Contributors

johntitus avatar

Watchers

Tomás Corral Casas avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.