Coder Social home page Coder Social logo

node-phantom's Introduction

Node-phantom

This is a bridge between PhantomJs and Node.js.

It is very much similar to the other bridge available, PhantomJS-Node, but is different in a few ways:

  • Way fewer dependencies/layers.
  • API has the idiomatic error indicator as first parameter to callbacks.
  • Uses plain Javascript instead of Coffeescript.

Requirements

You will need to install PhantomJS first. The bridge assumes that the "phantomjs" binary is available in the PATH.

The only other dependency for using it is socket.io.

For running the tests you will need Mocha. The tests require PhantomJS 1.6 or newer to pass.

Installing

npm install node-phantom

Usage

You can use it pretty much like you would use PhantomJS-Node, for example this is an adaptation of a web scraping example :

var phantom=require('node-phantom');
phantom.create(function(err,ph) {
  return ph.createPage(function(err,page) {
    return page.open("http://tilomitra.com/repository/screenscrape/ajax.html", function(err,status) {
      console.log("opened site? ", status);
      page.includeJs('http://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js', function(err) {
        //jQuery Loaded.
        //Wait for a bit for AJAX content to load on the page. Here, we are waiting 5 seconds.
        setTimeout(function() {
          return page.evaluate(function() {
            //Get what you want from the page using jQuery. A good way is to populate an object with all the jQuery commands that you need and then return the object.
            var h2Arr = [],
            pArr = [];
            $('h2').each(function() {
              h2Arr.push($(this).html());
            });
            $('p').each(function() {
              pArr.push($(this).html());
            });

            return {
              h2: h2Arr,
              p: pArr
            };
          }, function(err,result) {
            console.log(result);
            ph.exit();
          });
        }, 5000);
      });
	});
  });
});

phantom.create(callback,options)

options is an optional object with options for how to start PhantomJS. options.parameters is an array of parameters that will be passed to PhantomJS on the commandline. For example

phantom.create(callback,{parameters:{'ignore-ssl-errors':'yes'}})

will start phantom as:

phantomjs --ignore-ssl-errors=yes

You may also pass in a custom path if you need to select a specific instance of PhantomJS or it is not present in PATH environment. This can for example be used together with the PhantomJS package like so:

phantom.create(callback,{phantomPath:require('phantomjs').path})

Working with the API

Once you have the phantom instance you can use it much as you would the real PhantomJS, node-phantom tries to mimic the api.

An exception is that since this is a wrapper that does network communication to control PhantomJS, all methods are asynchronous and with a callback even when the PhantomJS version is synchronous.

Another notable exception is the page.evaluate method (and page.evaluateAsync method) that since PhantomJS 1.6 has a provision for extra arguments to be passed into the evaluated function. In the node-phantom world these arguments are placed after the callback. So the order is evaluatee, callback, optional arguments. In code it looks like :

page.evaluate(function(s){
	return document.querySelector(s).innerText;
},function(err,title){
	console.log(title);
},'title');

You can also have a look at the test folder to see some examples of using the API.

Other

Made by Alex Scheel Meyer. Released to the public domain.

node-phantom's People

Contributors

alexscheelmeyer avatar andrewraycode avatar davidmfoley avatar felixhoer avatar hh10k avatar mrorz avatar reid avatar sakhtar avatar vmeurisse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-phantom's Issues

handshaken Error on Socket.io

I'm getting the following each time:
"warn - client not handshaken client should reconnect"

It seems like an error with socket.io.

callback hint

I've seen some functions using a callback instead of returning a value.
Because this is different in phantomjs i suggest to point this out in the overview.

thanks in advance.

Events don't seem to be working

page.set('onResourceRequested',function(r) {
  console.log(r);
});

setting events like this doesn't seem to be working, also page has no method of on so I can't set events in that way.

Any thoughts?

Detect Phantom crash

I keep getting the following crash on Linux:

09:11:03.887 phantom stderr: PhantomJS has crashed. Please read the crash reporting guide at https://github.com/ariya/phantomjs/wiki/Crash-Reporting and file a bug report at https://github.com/ariya/phantomjs/issues/new with the crash dump file attached: /tmp/106608aa-64f2-37ce-6b453bd7-6876263f.dmp

09:11:04.011 Request() error evaluating open() call: Error: socket hang up
09:11:04.012 Poll Request error: Error: socket hang up
  1. Is there a way to detect it?
  2. Is there a way to restart Phantom?

handling console.log in the browser

Is there a way to listen to the browser console with node-phantom? It looks like phantom.js may output that information to the console, maybe node-phantom has an event for that?

specifying quality factor for jpg images

I came across the ability to specify the jpeg quality factor when reading about PhantomJS and was wondering if node-phantom would be able to handle this capability. So I looked at the call to 'render' and noticed the only parameter to pageRender is the filename. Not knowing how much of this works, specifically to which function 'pageRender' really maps, I put in a test to see if I could add an additional argument for the quality value. Unfortunately I didn't see any variation in the resulting jpeg file. For what its worth, I'm using jsgui-node-render-svg which has the node-phantom dependency.

Is there any way of getting at the quality factor?

thanks


As I was cleaning up things from my test, I realized that bridge.js has the actual call to page.render() so I did another test to check for an additional argument and if present pass it along to page.render() and I got what I expected.

So I guess all this means is that it is currently not supported in this releaseโ€ฆI can make my modifications available here if anyone so desires.

Access to page.settings or other way to change user agent?

I'm writing a script to pull my simple.com bank account using phantom to login and scrape the data, but need to set a custom user agent.

In native phantomjs, I'm doing this via:

page.settings.userAgent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.64 Safari/537.4'

Is there an analogous command in node-phantom?

phantom crash: code -1073741819 on setting view port

I am getting "phantom crash: code -1073741819" on setting view port width to a value greater 600 .

Following is the code i use

 var phantom = require('node-phantom');
            phantom.create(function (err, ph) {
        ph.createPage(function (err, page) {
    page.get('viewportSize',function(err,value){
        console.log(value);
        page.set('viewportSize',{width:800,height:600},function(err){
             page.set('content',data.html, function (err, status) {
            setTimeout(function() {
                page.render('capture.png',function(err){
                console.log("created image");
                //    ph.exit();
                });
                                }, 10000);
            });
        });
    });
        });
      });

If here i set height to 600 it will work fine .
//page.set('viewportSize',{width:600,height:600},function(err){

Can any body please help me out on this issue ?
Thanks

Problems with event listener

I'm trying to grab the html when AJAX operations are complete with some client javascript like this:

var evt = document.createEvent('Event');
evt.initEvent('__htmlReady__', true, true);
document.dispatchEvent(evt);

I'm listening for events like this:

page.onCallback = function() {
  page.get('content', function(err, html) {
    ph.exit();
    console.log(html);
  });
};

page.onInitialized = function() {
  page.evaluate(function() {
    document.addEventListener('__htmlReady__', function() {
      window.callPhantom();
    }, false);
    setTimeout(function() {
      window.callPhantom();
    }, 10000);
  });
};

But the event never triggers, only the timeout triggers.
This issue doesn't happen when I run phantomjs from the command line.

Any ideas?

Transfer ownership of this project

This project doesn't seem to be maintained, and in its current state it is a mess of bugs and missing features. We need to choose a new active maintainer. Any volunteers?

Pass paramaters to render?

Looking at the source, looks like pageCreated -> render only accepts filename and callback params... Any way to pass further options to the native phantom render method....

eg {format:png}

page.content support

I tried to access page.content but it is undefined. Of course there other methods to get the html of the page in phantomjs but this sounded like the most polished.

call callback on phantom exit

phantom.on('exit', function(code){
exitCode=code;
callback(code);
})

Sometimes phantomjs die, and I need to handle this event in nodejs?

--- DO NOT USE THIS LIBRARY ---

this library is unmaintained, full of bugs and missing features. the codebase is currently unmanagable due to many conflicting pull requests.

Please join me in deleting your forks of this repository

You should be using phantom-proxy, which is actively maintained and full featured.

There are several conflicting phantomjs libraries out there right now, and this one seems like it's ready to be put to bed.

node-phantom.js options.parameters

When passing params into phantom.create it throws an error. Its a slight overlook but here it is. Sorry to all if its already been reported.

line 24 in node-phantom.js

for(var parm in options.parameters) {
--> args.push('--' + option + '=' + options.parameters[parm]);
}

Works when i change it to 'parm'

for(var parm in options.parameters) {
args.push('--' + parm + '=' + options.parameters[parm]);
}

Cheers

Crashes PhantomJS when opening https://www.google.com/offers

Not sure what the issue is with this specific site, but using node-phantom I cannot open https://www.google.com/offers. The same code works fine with PhantomJS-Node. I am not sure why this particular site fails; other https sites work fine, including https://www.google.com.

The first part will print "1: opened site? success".

The second, using node-phantom, never throws any errors, but by putting some console.log's in the source I can see that the phantom "exit" event is being raised with an error code of -1073741819 (basically it seems to be crashing). Both are using the same PhantomJS executable (using the phantomjs package) on Windows 8.

var phantom = require('phantom');

phantom.create(function(ph) {
    ph.createPage(function(page) {
        page.open("https://www.google.com/offers", function(status) {
            console.log("1: opened site? ", status);
            ph.exit();
        });
    });
});

var phantom2 = require('node-phantom');

phantom2.create(function(err, ph) {
    if (err) throw err;
    ph.createPage(function(err, page) {
        if (err) throw err;
        page.open("https://www.google.com/offers", function(err, status) {
            if (err) throw err;
            console.log("2: opened site? ", status);
            ph.exit();
        });
    });
}, { phantomPath: require('phantomjs').path });

paperSize

When trying to create a node-phantom version of the rasterize.js from phantom.js, there is no paperSize attribute available in node-phantom. Is there a way to set the page size for the PDF?

Dan

Is this abandoned?

There's some good pull requests that have been sitting here for many months now.

I'm actually using this in a production project, and need the pull requests, I've resorted to making my own repository to apply the patches, but I would much prefer for another official release with pull requests included.

@alexscheelmeyer can you find the time to merge some of these and push a new version? Specifically issue #68 could really do with being merged as it causes random failures that are hard to debug.

If you don't have the time perhaps you could ask a couple of the people who've filed requests to be collaborators?

createPage callback not fired in node v0.10

I've noticed that using node 0.10 this code doesn't work

(function(){
    var phantom = require('node-phantom');

    phantom.create(function(e, ph){
        ph.createPage(function() {
            console.log("in");
        });
    });
})();

not sure what the issue is, but it works just fine with node 0.11 and <0.10

Is it possible to monitor dom changes with node-phantom?

I would like to open a page and watch a datatable on that page as it is populated. The target site dynamically adds content to the table over a period of minutes / hours (it is live sports data).

Is this possible with node-phantom / phantomJS?

Tests not running successfully?

Phantomjs 1.9.0 installed, node v0.10.6.

Tests don't succeed in version 6949800.

npm test fails because it hasn't been updated to run mocha.

mocha results in...

$ mocha

  โ€คโ€คโ€คโ€คโ€คโ€คโ€คโ€คโ€คโ€คโ€คโ€คโ€คโ€คโ€คโ€คphantom stderr: execvp(): No such file or directory

โ€ค

  โœ– 2 of 17 tests failed:

  1) Phantom Create should change behavior on different phantomPath:
     Error: spawn ENOENT
      at errnoException (child_process.js:980:11)
      at Process.ChildProcess._handle.onexit (child_process.js:771:34)

  2) Phantom should be able to inject js:
     Error: timeout of 2000ms exceeded
      at null.<anonymous> (/usr/lib/node_modules/mocha/lib/runnable.js:167:14)
      at Timer.listOnTimeout [as ontimeout] (timers.js:110:15)

For what it's worth, I can't get phantomjs-node working correctly today either, but that looks like a dnode problem.

Calling ph.exit() doesn't terminate spawned process

Hi, has anyone else encountered this issue? I spawn one or more child processes (ie. PhantomJS instances). After opening up whatever web pages I need to open up, I naturally want to call exit(). However, this doesn't kill/terminate the child processes, and so I'm left with a bunch of orphans.

I've found what hopes to be a fix. Inside of node-phantom.js, we have a proxy variable at the bottom, which is an object with an exit() method. Right now, all it does is call request(). If we add a call to phantom.exit('SIGTERM'), that seems to do the trick.

I can send a pull request if you'd like.

Thanks!

Please specify node-phantom's licence

Hi alexscheelmeyer,
Would you mind specifying your licence of choice for node-phantom? Although I did not study your code I prefer your approach to the older sgentle / phantomjs-node, but I would also like to be sure of what your intentions are. At the moment you just write that it is "released to the public domain".

Thanks!

PhantomJS bin script cannot be spawned in Windows using phantomPath option

I tried to detect why a script does not work in my system (Windows 7) and found that the problem is related to node-phantom and phantomjs bin script. Namely, node-phantom cannot spawn PhantomJS when phantomPath option points to phantomjs bin script.
This can be fixed (quick and dirty) by changing the line to the following:

var phantom=child.spawn("node", [options.phantomPath].concat(args));

Is it possible to make some changes (maybe via checks and/or additional option) so that bin script can be used in Windows to run PhantomJS?

second call to page.injectJs fail

Hi,

I have this code:

phantom.create(function (err, pha) {
        ph = pha;
        ph.createPage(function (err, p) {
                page = p;
                page.onConsoleMessage = function(msg) { console.log(msg); };
                page.open(signInUrl, function (err, status) {
                console.log(status);
                page.includeJs(jquery, function (err) {
                        page.evaluate(function (params) {
                                $('input[type=email]').val(params.login);
                                $('input[type=password]').val(params.password);
                                $('input[type=submit]').click();
                            }, function (err, out) {
                                setTimeout(verifCode, 5000);
                            }, {login: login, password: password});
                          });
                    });
            });
    });

var verifCode = function() {
    page.includeJs(jquery, function (err) {
...

The first page contains a form and when I submit it, phantom navigates to a new page so I need to inject jquery a second time.
But node-phantom crash in node-phantom.js:178 during the call of page.includeJs(

case 'pageEvaluatedAsync':
    cmds[cmdId].cb(null);
    delete cmds[cmdId];

I patched the code and it's working but not sure to really understand what happened and if my patch will not have side effect.

case 'pageEvaluatedAsync':
    if (cmds[cmdId] != null) {
        cmds[cmdId].cb(null);
        delete cmds[cmdId];
    }

15 second delay after code using node-phantom exits

I've written a quick script to illustrate this problem. I'm not sure what's happening, as I've been able to confirm that phantomjs exits when phantom.exit() callback hits (see below). All the tests in the distribution, run via expresso, also suffer from this 15 second delay.

I've tried node v0.6.21, v0.7.12 and v0.8.14 (what I use). I've tried phantomjs v0.6.1 and v0.7.0 (what I use). This is on a CentOS 6 x86_64 box.

date && node slow_test.js && date && node -v && phantomjs --version
Fri Nov 16 15:45:20 EST 2012
16 Nov 15:45:22 - Phantom created
16 Nov 15:45:22 - ewaters  28965 20.0  0.1 1415240 54104 pts/32  Sl+  15:45   0:00 phantomjs /home/ewaters/code/Absinthe/node_modules/node-phantom/bridge.js 43012
ewaters  28973  0.0  0.0 106088  1156 pts/32   S+   15:45   0:00 /bin/sh -c ps aux | grep phantomjs
ewaters  28975  0.0  0.0 103240   872 pts/32   S+   15:45   0:00 grep phantomjs

16 Nov 15:45:22 - Page created
16 Nov 15:45:22 - Page opened with status success
16 Nov 15:45:22 - Phantom exited
16 Nov 15:45:22 - ewaters  28979  0.0  0.0 106088  1156 pts/32   S+   15:45   0:00 /bin/sh -c ps aux | grep phantomjs
ewaters  28981  0.0  0.0 103240   872 pts/32   S+   15:45   0:00 grep phantomjs

16 Nov 15:45:22 - Waterfall complete
Fri Nov 16 15:45:37 EST 2012
v0.8.14
1.7.0
var phantom = require('node-phantom'),
    async   = require('async'),
    util    = require('util'),
    child_process = require('child_process');

var page, ph;

async.waterfall([
  function (cb) {
    phantom.create(cb);
  },
  function (result, cb) {
    ph = result;
    util.log("Phantom created");
    child_process.exec("ps aux | grep phantomjs", cb);
  },
  function (stdout, stderr, cb) {
    util.log(stdout);
    ph.createPage(cb);
  },
  function (result, cb) {
    page = result;
    util.log("Page created");
    page.open("http://www.google.com", cb);
  },
  function (status, cb) {
    util.log("Page opened with status " + status);
    ph.exit(cb);
  },
  function (cb) {
    util.log("Phantom exited");
    child_process.exec("ps aux | grep phantomjs", cb);
  },
  function (stdout, stderr, cb) {
    util.log(stdout);
    cb();
  }
], function (err) {
  if (err) util.log(err);
  util.log("Waterfall complete");
});

Crashes when specifying phantomjs parameters

phantom.create(callback,{parameters:{'ignore-ssl-errors':'yes'}})

This will produce this error:

....\node_modules\node-phantom\node-phantom.js:24
                                args.push('--' + option + '=' + options.parameters[parm]);
                                                 ^
ReferenceError: option is not defined
    at spawnPhantom (....\node_modules\node-phantom\node-phantom.js:24:22)
    at Object.module.exports.create (....\node_modules\node-phantom\node-phantom.js:60:3)

To fix it, replace "option" with "parm" on line 24:

for(var parm in options.parameters) {
                args.push('--' + parm + '=' + options.parameters[parm]);
            }

Netsniff.js Example

With your module on nodejs, I'm trying to use the HAR Exporting example on the main phantomjs repo: https://github.com/ariya/phantomjs/blob/master/examples/netsniff.js

I'm encountering issues regarding Date objects. I've combined performing the render aswell as the HAR exporter in one action:

       phantom.create(function (error, ph){
                return ph.createPage(function (err, page){
                        page.address = final.url;
                        page.resources = [];
                        page.onLoadStarted = function () {
                                 page.startTime = new Date();
                                console.log(page.startTime);
                         };
                        page.onResourceRequested = function (req) {
                                page.resources[req.id] = {
                                request: req,
                                startReply: null,
                                endReply: null
                                };
                                //console.log(page.resources);
                        };
                        page.onResourceReceived = function (res) {
                                if (res.stage === 'start') {
                                        page.resources[res.id].startReply = res;
                                //      console.log(res);
                                 }
                                if (res.stage === 'end') {
                                        page.resources[res.id].endReply = res;
                                //      console.log(res);
                                 }
                        };
                        page.set('viewportSize', { width: 1024, height: 768 });
                        return page.open(final.url, function (error, stat){
                                        var har;
                                        page.title = page.evaluate(function () {
                                        return document.title;
                                         });
                                        final.har = { 'onInputData': null };
                                        har = createHAR(page.address, page.title, page.startTime, page.resources);
                                        final.har.onInputData = JSON.stringify(har, undefined, 4);
                                        page.endTime = new Date();
                                        final.path = '/home/myprop/public_html/assets/i/' + final.truedomain;
                                        var now = Date.now();
                                        final.fullpath = final.path + '/' + now + '.png';
                                        final.thumbpath = final.path + '/' + now + 'thumb.png';
                                        fs.mkdir( final.path, 0755, function (err) {
                                                page.render( final.fullpath , function(err) {
                                                        gm(final.fullpath).thumb(150, 150, final.thumbpath, 50, function(err){
                                                                if(err) return console.dir(arguments)
                                                                callback(final);
                                                                ph.exit();
                                        });
                                        });
                                        });
                });

When I execute the request, it's erroring with:

/home/api/Andrew.js:278
            startedDateTime: request.time.toISOstring(),
                                          ^
TypeError: Object 2012-11-09T05:36:17.385Z has no method 'toISOstring'
    at /home/api/Andrew.js:278:43
    at Array.forEach (native)
    at createHAR (/home/api/Andrew.js:268:15)
    at Object.exports.scan [as cb] (/home/api/Andrew.js:240:13)
    at Socket.module.exports.create.io.sockets.on.socket.on.id (/home/api/node_modules/node-phantom/node-phantom.js:114:19)
    at Socket.EventEmitter.emit [as $emit] (events.js:93:17)
    at SocketNamespace.handlePacket (/home/api/node_modules/node-phantom/node_modules/socket.io/lib/namespace.js:335:22)
    at Manager.onClientMessage (/home/api/node_modules/node-phantom/node_modules/socket.io/lib/manager.js:487:38)
    at WebSocket.Transport.onMessage (/home/api/node_modules/node-phantom/node_modules/socket.io/lib/transport.js:387:20)
    at Parser.<anonymous> (/home/api/node_modules/node-phantom/node_modules/socket.io/lib/transports/websocket/default.js:36:10)

That function is what's provided with the phantomjs example and does work correctly. Why I believe it's relevant to ask here, is because I don't think the objects I'm setting up are getting passed or executed:

                        page.onLoadStarted = function () {
                                 page.startTime = new Date();
                                console.log(page.startTime);
                         };
                        page.onResourceRequested = function (req) {
                                page.resources[req.id] = {
                                request: req,
                                startReply: null,
                                endReply: null
                                };
                                //console.log(page.resources);
                        };
                        page.onResourceReceived = function (res) {
                                if (res.stage === 'start') {
                                        page.resources[res.id].startReply = res;
                                //      console.log(res);
                                 }
                                if (res.stage === 'end') {
                                        page.resources[res.id].endReply = res;
                                //      console.log(res);
                                 }
                        };

Any help would be much appreciated.

Null result on scraping new page after click

Hello, it's possible I am missing some very basic here but have been googling and overflowing for hours so figured I ask here too.

I am successfully scraping data from the first page, but then I have to click on a link to the next page and repeat. Right now I can only get back a null result, but with no errors. The first result (availabilities) is working perfectly. I can't seem to find any good code examples of scraping-clicking-scraping with this module so I might just be making a silly mistake. Thanks in advance for any help!

page.includeJs('http://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js', function(err) {
            page.evaluate(function() {
              var availabilities = [];
              $('#calendar tbody tr').each(function() {
                $(this).children().filter('.a').each(function(){
                  availabilities.push( $(this).children().attr('href') );
                });
              });
              return {
                availabilities: availabilities,
              };
            }, function(err,result) {
              console.log(result.availabilities.length);
            });
            page.evaluate(function(){
             var body = "";
             $('#nextWeek')[0].click(); //move to the next two weeks
             page.onLoadFinished = function(status){
              body += $('body').html();
             }
              return {
                body: body
              };
            }, function(err, result){
                console.log(result)
            });
        ph.exit();
        }); // page.includeJs

page.includeJs callback does not fire

//initialize phantom and page
page.includeJs('http://ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js', function (err, res) {
  console.log("included");
})

I use the provided code snippet to debug the includeJs output.
It works when launched by phantomjs itself, but it does not work when used via node-phantom.

It seems that in bridge.js the respond is not called in pageIncludeJs case.
Maybe the respond funciton is not seen in the controlpage context?

Nodejs: 0.10.4
node-phantom: 0.2.3
phantomjs: 1.9.0

Footer's contents seems don't work

I'm trying create custom footers such in phantomjs examples: https://github.com/ariya/phantomjs/blob/master/examples/printheaderfooter.js

Here is my code:

var phantom = require('node-phantom');

phantom.create(function (err, ph) {
    ph.createPage(function (err, page) {
         page.set('paperSize', {
              format: 'A4',
              orientation: 'portrait',
              footer: {
                contents: ph.callback(function (pageNum, numPages) {
                  if (pageNum == 1) {
                    return "";
                  }
                  return "<h1>Header <span style='float:right'>" + pageNum + " / " + numPages + "</span></h1>";
                })
              }
         }, function () {
             page.open('http://www.google.com', function () {
              })
         })
    })
});

But unfortunately I get the following error:

TypeError: Object #<Object> has no method 'callback';

Is it bug that ph does not expose callback method?

After setting userAgent, page doesn't navigate

In order to set the userAgent property to the 'settings' object of the page, I have to set all the properties, not just userAgent, otherwise it won't get any content.

          page.set 'settings',
            userAgent: 'test'
            XSSAuditingEnabled: false,
            javascriptCanCloseWindows: true,
            javascriptCanOpenWindows: true,
            javascriptEnabled: true,
            loadImages: false,
            localToRemoteUrlAccessEnabled: false,
            webSecurityEnabled: true

In order to test it, if you get the "settings" after doing a set, you will get only those properties that were set, losing the rest of them.

Thanks!

Infinite loop when scraping page

I am using the sample code for scraping this url: http://www.anthropologie.com/anthro/category/office+d%e9cor+%26+stationery/home-office.jsp

The code goes into an infinite loop of creating a phantom instance, calling createPage(), and then calling page.open(). The function I call to create the phantom instance does not get recalled.

The only two npm packages I'm using are the phantomjs and node-phantom packages.

Here's the code:

var phantom=require('node-phantom');

var getPage = function(url, callback) {
phantom.create(function(err,ph) {
console.log('phantom created.');
if(err){
ph.exit();
callback(err);
}else{
ph.createPage(function(err,page) {
console.log('page created.');

            if(err){
                ph.exit();
                callback(err);
            }else{
                page.open(url, function(err) {
                    console.log('page opened: ' + url);

                    ph.exit();
                });
            }
        });
    }
}, {

phantomPath: require('phantomjs').path,
parameters: {'ignore-ssl-errors':'yes'}
});
}

getPage('http://www.anthropologie.com/anthro/category/office+d%e9cor+%26+stationery/home-office.jsp', function(err){
console.log('done');
})

phantomJs stdout/stderr are visible, is that correct?

Hi alexscheelmeyer,
Using node-phantom always displays the underlying phantomJs console output when executing, I just wanted to know if this is what you wanted or a side effect of your code being still in a 'debugging' stage. Below are a few sample lines of what I get on my Ubuntu:

phantom stdout: 2012-08-16T08:07:44 [DEBUG] Got bus address:  "unix:abstract=/tmp/dbus-RyR7SBz8te,guid=18c8415be3d1b8357dd72d8300000019" 
2012-08-16T08:07:44 [DEBUG] Connected to accessibility bus at:  "unix:abstract=/tmp/dbus-RyR7SBz8te,guid=18c8415be3d1b8357dd72d8300000019" 
2012-08-16T08:07:44 [DEBUG] Registered DEC:  true 
2012-08-16T08:07:45 [DEBUG] Registered event listener change listener:  true 
phantom stderr: 2012-08-16T08:07:46 [WARNING] QFont::setPixelSize: Pixel size <= 0 (0)

Thanks.

G.

WebPage.viewportSize parameter ignored?

Hi alexscheelmeyer,
I believe node-phantom is ignoring the viewportSize parameter of PhantomJS' WebPage object. When I set it and then open and render the page I get an image whose width is 400 pixels, that is PhantomJS' standard rather than what I specify.

I have also tried sgentle/phantomjs-node's syntax style doing something like:

page.set('viewportSize', { width: 200, height: 400 });

but the set method does not exist.

Unless you plan to address this soon, I will likely have to drop node-phantom for sgentle/phantomjs-node, as simulating a given web browser width is absolutely necessary to me.

Thanks.

G.

Many phantom child processes never close

Great module, thanks for sharing it!

I'm using it in an app that spawns many phantom tasks, i.e., multiple calls to phantom.create(), but I'm seeing that many/most/perhaps-all of the underlying phantomjs processes are not ending, so of course the machine eventually runs out of memory and crashes.

Each phantom task does a page.open and page.evaluate, and I call phantom.exit() in the callback to the evaluate().

Any idea what might cause the processes to hang around? Any suggestions for dealing with this besides occasionally counting the number of running phantomjs processes and killing the oldest ones?

Here's my coffeescript code:

    callPhantom = () ->
        phClient = (err,ph) ->
          if err
            ph.exit()
            sender(err, 500, 'Error creating Phantom Client.')
            return
          return ph.createPage (err,page) ->
            if err
                ph.exit()
                sender(err, 500, 'Error creating Phantom Page.')
                return
            page.open "http://" + host + ":"+ port + url, (err, status) ->
              if err
                ph.exit()
                sender(err, status, "Error opening Phantom Page: #{url}\n[Got HTTP #{status}]")
                return

              getHTML = () ->
                return document.documentElement.innerHTML

              respond = () ->
                  result = page.evaluate getHTML, (err, data) ->
                    ph.exit()
                    my.setHeaders(res, 'Phantom')
                    code = if err then 500 else 200
                    sender(err, code, data)

              setTimeout respond, 1000 # @todo - Reconsider if we can detect completion, so we dont have to use the Timeout.

        phantom.create phClient, {
            parameters: {'load-images':'no'},
        }

page.evaluate(func, callback, param). param can not be a Function.

params can be Function in phatomjs, but node-phantom currently does not support it.

function getBodyHTML(){
    return document.body.innerHTML
}

var phantom = require('node-phantom')
phantom.create(function(_, ph){
    ph.createPage(function(_, page){
        page.open('http://example.com', function(_, status){
            console.log(status);
            page.evaluate(function(aFunc) {
                return aFunc()     // FAILED HERE !!!!!!!
            }, function(_, ret){
                console.log(ret)
            }, getBodyHTML)
        })
    })
})

I currently solve the problem by func.toString() then eval. But I would like node-phantom can support function serialize.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.