Coder Social home page Coder Social logo

prerender / prerender Goto Github PK

View Code? Open in Web Editor NEW
6.4K 6.4K 917.0 775 KB

Node server that uses Headless Chrome to render a javascript-rendered page as HTML. To be used in conjunction with prerender middleware.

License: MIT License

JavaScript 100.00%

prerender's Introduction

Prerender

Prerender is a node server that uses Headless Chrome to render HTML, screenshots, PDFs, and HAR files out of any web page. The Prerender server listens for an http request, takes the URL and loads it in Headless Chrome, waits for the page to finish loading by waiting for the network to be idle, and then returns your content.

The quickest way to run your own prerender server:
$ npm install prerender
server.js
const prerender = require('prerender');
const server = prerender();
server.start();
test it:
curl http://localhost:3000/render?url=https://www.example.com/

Use Cases

The Prerender server can be used in conjunction with our Prerender.io middleware in order to serve the prerendered HTML of your javascript website to search engines (Google, Bing, etc) and social networks (Facebook, Twitter, etc) for SEO. We run the Prerender server at scale for SEO needs at https://prerender.io/.

The Prerender server can be used on its own to crawl any web page and pull down the content for your own parsing needs. We host the Prerender server for your own crawling needs at https://prerender.com/.

Prerender differs from Google Puppeteer in that Prerender is a web server that takes in URLs and loads them in parallel in a new tab in Headless Chrome. Puppeteer is an API for interacting with Chrome, but you still have to write that interaction yourself. With Prerender, you don't have to write any code to launch Chrome, load pages, wait for the page to load, or pull the content off of the page. The Prerender server handles all of that for you so you can focus on more important things!

Below you will find documentation for our Prerender.io service (website SEO) and our Prerender.com service (web crawling).

Click here to jump to Prerender.io documentation

Click here to jump to Prerender.com documentation

Prerender.io

For serving your prerendered HTML to crawlers for SEO

Prerender solves SEO by serving prerendered HTML to Google and other search engines. It's easy:

  • Just install the appropriate middleware for your app (or check out the source code and build your own)
  • Make sure search engines have a way of discovering your pages (e.g. sitemap.xml and links from other parts of your site or from around the web)
  • That's it! Perfect SEO on javascript pages.

Middleware

This is a list of middleware available to use with the prerender service:

Official middleware

Javascript
Ruby
Apache
Nginx

Community middleware

PHP
Java
Go
Grails
Nginx
Apache

Request more middleware for a different framework in this issue.

How it works

This is a simple service that only takes a url and returns the rendered HTML (with all script tags removed).

Note: you should proxy the request through your server (using middleware) so that any relative links to CSS/images/etc still work.

GET https://service.prerender.io/https://www.google.com

GET https://service.prerender.io/https://www.google.com/search?q=angular

Running locally

If you are trying to test Prerender with your website on localhost, you'll have to run the Prerender server locally so that Prerender can access your local dev website.

If you are running the prerender service locally. Make sure you set your middleware to point to your local Prerender server with:

export PRERENDER_SERVICE_URL=http://localhost:3000

$ git clone https://github.com/prerender/prerender.git
$ cd prerender
$ npm install
$ node server.js

Prerender will now be running on http://localhost:3000. If you wanted to start a web app that ran on say, http://localhost:8000, you can now visit the URL http://localhost:3000/http://localhost:8000 to see how your app would render in Prerender.

To test how your website will render through Prerender using the middleware, you'll want to visit the URL http://localhost:8000?_escaped_fragment_=

That should send a request to the Prerender server and display the prerendered page through your website. If you View Source of that page, you should see the HTML with all of the <script> tags removed.

Keep in mind you will see 504s for relative URLs when accessing http://localhost:3000/http://localhost:8000 because the actual domain on that request is your prerender server. This isn't really an issue because once you proxy that request through the middleware, then the domain will be your website and those requests won't be sent to the prerender server. For instance if you want to see your relative URLS working visit http://localhost:8000?_escaped_fragment_=

Customization

You can clone this repo and run server.js OR include prerender in your project with npm install prerender --save to create an express-like server with custom plugins.

Options

chromeLocation

var prerender = require('./lib');

var server = prerender({
    chromeLocation: '/Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary'
});

server.start();

Uses a chrome install at a certain location. Prerender does not download Chrome so you will want to make sure Chrome is installed on your server already. The Prerender server checks a few known locations for Chrome but this lets you override that.

Default: null

logRequests

var prerender = require('./lib');

var server = prerender({
    logRequests: true
});

server.start();

Causes the Prerender server to print out every request made represented by a + and every response received represented by a -. Lets you analyze page load times.

Default: false

captureConsoleLog

var prerender = require('./lib');

var server = prerender({
    captureConsoleLog: true
});

server.start();

Prerender server will store all console logs into pageLoadInfo.logEntries for further analytics.

Default: false

pageDoneCheckInterval

var prerender = require('./lib');

var server = prerender({
    pageDoneCheckInterval: 1000
});

server.start();

Number of milliseconds between the interval of checking whether the page is done loading or not. You can also set the environment variable of PAGE_DONE_CHECK_INTERVAL instead of passing in the pageDoneCheckInterval parameter.

Default: 500

pageLoadTimeout

var prerender = require('./lib');

var server = prerender({
    pageLoadTimeout: 20 * 1000
});

server.start();

Maximum number of milliseconds to wait while downloading the page, waiting for all pending requests/ajax calls to complete before timing out and continuing on. Time out condition does not cause an error, it just returns the HTML on the page at that moment. You can also set the environment variable of PAGE_LOAD_TIMEOUT instead of passing in the pageLoadTimeout parameter.

Default: 20000

waitAfterLastRequest

var prerender = require('./lib');

var server = prerender({
    waitAfterLastRequest: 500
});

server.start();

Number of milliseconds to wait after the number of requests/ajax calls in flight reaches zero. HTML is pulled off of the page at this point. You can also set the environment variable of WAIT_AFTER_LAST_REQUEST instead of passing in the waitAfterLastRequest parameter.

Default: 500

followRedirects

var prerender = require('./lib');

var server = prerender({
    followRedirects: false
});

server.start();

Whether Chrome follows a redirect on the first request if a redirect is encountered. Normally, for SEO purposes, you do not want to follow redirects. Instead, you want the Prerender server to return the redirect to the crawlers so they can update their index. Don't set this to true unless you know what you are doing. You can also set the environment variable of FOLLOW_REDIRECTS instead of passing in the followRedirects parameter.

Default: false

Plugins

We use a plugin system in the same way that Connect and Express use middleware. Our plugins are a little different and we don't want to confuse the prerender plugins with the prerender middleware, so we opted to call them "plugins".

Plugins are in the lib/plugins directory, and add functionality to the prerender service.

Each plugin can implement any of the plugin methods:

init()

requestReceived(req, res, next)

tabCreated(req, res, next)

pageLoaded(req, res, next)

beforeSend(req, res, next)

Available plugins

You can use any of these plugins by modifying the server.js file

basicAuth

If you want to only allow access to your Prerender server from authorized parties, enable the basic auth plugin.

You will need to add the BASIC_AUTH_USERNAME and BASIC_AUTH_PASSWORD environment variables.

export BASIC_AUTH_USERNAME=prerender
export BASIC_AUTH_PASSWORD=test

Then make sure to pass the basic authentication headers (password base64 encoded).

curl -u prerender:wrong http://localhost:3000/http://example.com -> 401
curl -u prerender:test http://localhost:3000/http://example.com -> 200

removeScriptTags

We remove script tags because we don't want any framework specific routing/rendering to happen on the rendered HTML once it's executed by the crawler. The crawlers may not execute javascript, but we'd rather be safe than have something get screwed up.

For example, if you rendered the HTML of an angular page but left the angular scripts in there, your browser would try to execute the angular routing and possibly end up clearing out the HTML of the page.

This plugin implements the pageLoaded function, so make sure any caching plugins run after this plugin is run to ensure you are caching pages with javascript removed.

httpHeaders

If your Javascript routing has a catch-all for things like 404's, you can tell the prerender service to serve a 404 to google instead of a 200. This way, google won't index your 404's.

Add these tags in the <head> of your page if you want to serve soft http headers. Note: Prerender will still send the HTML of the page. This just modifies the status code and headers being sent.

Example: telling prerender to server this page as a 404

<meta name="prerender-status-code" content="404">

Example: telling prerender to serve this page as a 302 redirect

<meta name="prerender-status-code" content="302">
<meta name="prerender-header" content="Location: https://www.google.com">

whitelist

If you only want to allow requests to a certain domain, use this plugin to cause a 404 for any other domains.

You can add the whitelisted domains to the plugin itself, or use the ALLOWED_DOMAINS environment variable.

export ALLOWED_DOMAINS=www.prerender.io,prerender.io

blacklist

If you want to disallow requests to a certain domain, use this plugin to cause a 404 for the domains.

You can add the blacklisted domains to the plugin itself, or use the BLACKLISTED_DOMAINS environment variable.

export BLACKLISTED_DOMAINS=yahoo.com,www.google.com

in-memory-cache

Caches pages in memory. Available at prerender-memory-cache

s3-html-cache

Caches pages in S3. Available at coming soon


Prerender.com

For doing your own web crawling

When running your Prerender server in the web crawling context, we have a separate "API" for the server that is more complex to let you do different things like:

  • get HTML from a web page
  • get screenshots (viewport or full screen) from a web page
  • get PDFS from a web page
  • get HAR files from a web page
  • execute your own javascript and return json along with the HTML

If you make an http request to the /render endpoint, you can pass any of the following options. You can pass any of these options as query parameters on a GET request or as JSON properties on a POST request. We recommend using a POST request but we will display GET requests here for brevity. Click here to see how to send the POST request.

These examples assume you have the server running locally on port 3000 but you can also use our hosted service at https://prerender.com/.

url

The URL you want to load. Returns HTML by default.

http://localhost:3000/render?url=https://www.example.com/

renderType

The type of content you want to pull off the page.

http://localhost:3000/render?renderType=html&url=https://www.example.com/

Options are html, jpeg, png, pdf, har.

userAgent

Send your own custom user agent when Chrome loads the page.

http://localhost:3000/render?userAgent=ExampleCrawlerUserAgent&url=https://www.example.com/

fullpage

Whether you want your screenshot to be the entire height of the document or just the viewport.

http://localhost:3000/render?fullpage=true&renderType=html&url=https://www.example.com/

Don't include fullpage and we'll just screenshot the normal browser viewport. Include fullpage=true for a full page screenshot.

width

Screen width. Lets you emulate different screen sizes.

http://localhost:3000/render?width=990&url=https://www.example.com/

Default is 1440.

height

Screen height. Lets you emulate different screen sizes.

http://localhost:3000/render?height=100&url=https://www.example.com/

Default is 718.

followRedirects

By default, we don't follow 301 redirects on the initial request so you can be alerted of any changes in URLs to update your crawling data. If you want us to follow redirects instead, you can pass this parameter.

http://localhost:3000/render?followRedirects=true&url=https://www.example.com/

Default is false.

javascript

Execute javascript to modify the page before we snapshot your content. If you set window.prerenderData to an object, we will pull the object off the page and return it to you. Great for parsing extra data from a page in javascript.

http://localhost:3000/render?javascript=window.prerenderData=window.angular.version&url=https://www.example.com/

When using this parameter and window.prerenderData, the response from Prerender will look like:

{
	prerenderData: { example: 'data' },
	content: '<html><body></body></html>'
}

If you don't set window.prerenderData, the response won't be JSON. The response will just be the normal HTML.

Get vs Post

You can send all options as a query parameter on a GET request or as a JSON property on a POST request. We recommend using the POST request when possible to avoid any issues with URL encoding of GET request query strings. Here's a few pseudo examples:

POST http://localhost:3000/render
{
	renderType: 'html',
	javascript: 'window.prerenderData = window.angular.version',
	url: 'https://www.example.com/'
}
POST http://localhost:3000/render
{
	renderType: 'jpeg',
	fullpage: 'true',
	url: 'https://www.example.com/'
}

Check out our full documentation

License

The MIT License (MIT)

Copyright (c) 2013 Todd Hooper <[email protected]>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

prerender's People

Contributors

ahmedmozaly avatar avelinesg avatar bakura10 avatar d-simon avatar davebrown-dev avatar golyo88 avatar hollypony avatar jupl avatar lluczo avatar maxlath avatar michieldemey avatar mikermcneil avatar paparent avatar patrickjs avatar patroeper avatar pierrickmartos avatar portersupport avatar pwmckenna avatar rmyock avatar rubensayshi avatar sharshenov avatar stanback avatar stumpyfr avatar taylorkearns avatar thoop avatar uatec avatar varrocs avatar ym avatar ysimonson avatar zbettenbuk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prerender's Issues

Escape Fragment works, but user agent doesn't?

I installed the server and client. I tested escape fragment and it works. However, when I set my user agent to:

'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'

Both in the Chrome dev tool and node.js request.

The server doesn't catch it.

Any help? Thanks!

Warn users of dangers of becoming an unmonitored open proxy

Speaking from the land of personal experience

With the ease of setting up prerender, it is likely common that people set it up to accept all inbound requests from 0.0.0.0. There are bots that crawl the web, test port 80/3000 to be this kind of relay, and then take advantage of short-sighted developers cough me cough

If prerender, by default (without either a code change, ENV or argv setting) only listened to 127.0.0.1, or forced a special x- http header on all requests (and user had to generate one to run the script), it'd be a nice step in cutting back on the open proxy problem.

[RFC] prerender cluster

Hey guys,

would you be interested in a cluster version of prerender? The implementation might be fairly easy, the only tricky thing would probably be how to assign ports to phantom.

WDYT?

how to modify the page before sending to browser?

for instance, I'd like to change the title of the page with javascript before sending it to the browser.

module.exports = {
afterPhantomRequest: function(phantom, context, next) {     
        phantom.evaluate(function() { 
          document.title = "Test";                                   
        });
    next();
  }
}

the rendered page title is not "Test" as I expected

I guess I have to do something like removeScriptTag.js plugin? modify the string of the HTML?

Implement middleware in more frameworks/languages

I'll keep this list updated as more middleware is added.

Feel free to implement one of the frameworks that people are asking for!

Add a comment if you'd like to see more middleware in another language/framework, and I'll add it to the list!

inMemoryHtmlCache seems to be broken

I've been trying to get the inMemoryHtmlCache to work. It doesn't seem to work at all. After a little bit of debugging it does not seem to share a single cache between all workers. And even worse, it appears to create a separate cache for the beforePhantomRequest and afterPhantomRequest functions. Therefore the page being caches in afterPhantomRequest can never be read from the beforePhantomRequest function.

GZip content before saving it so S3

It may be useful to add an option to Prerender so that before saving to S3, Prerender would gzip the HTML file (I'd not activate this by default as it would introduce some more overhead to your Heroku server, but for people that owuld host it it would be useful).

Of course, you must not forget to add the right headers ('Content-Encoding': 'gzip').

page rendered doesn't look accurate

when I try the following url, the rendered html looks absolutely awful. what is causing this behavior? how to fix this?

http://service.prerender.io/http://www.bestbuy.ca

redirects to passed url every time

it seems to happen more frequently now whereas a few weeks ago it wasn't happening

basically when I pass

serverip:3000/http://somesite.com

it redirects to somesite.com

I clearly remember prerender being able to load that site before.

Memory Quota Exceeded on Heroku

Seeing this error on Heroku now that the server doesn't crash/restart anymore.

heroku[web.1]: Process running mem=512M(100.2%)
heroku[web.1]: Error R14 (Memory quota exceeded)

Need to make sure the page is always being cleaned up, and that there are no memory leaks in closures. Phantomjs might be doing some caching or holding onto cookies? Otherwise it might be in the phantom-node bridge.

Reminder to myself: Check and make sure the HTML returned from the page isn't leaked. Memory leak seems semi-random and adds ~1MB for a request to the stack sometimes.

prerender sometimes works only after xth try

Basically sometimes it works just fine, sometimes it returns something like this

<html><head></head><body></body></html>

And sometimes it works just fine. Console shows no errors in either case.

It's a website using angular, has anyone encountered similar issue? I've tried it on both prerender.herokuapp.com and my herkou instance. Is there any way to debug this? Seems to be the same problem as was discussed(and fixed) in #17

Cannot run locally

I tried to run this locally using

npm install
node index.js

But it fails like

events.js:72
        throw er; // Unhandled 'error' event
              ^
Error: spawn ENOENT
    at errnoException (child_process.js:945:11)
    at Process.ChildProcess._handle.onexit (child_process.js:736:34)

Any insights?

Use npm to distribute prerender server.

I would be nice if we could only import prerender-server using npm. It makes easier to update and configure.

$ npm install prerender-server --save
$ cat index.js
var express = require('express');
var app = express();

app.configure(function(){
app.use(require('prerender-server'));
});

app.listen(3000);

Not waiting for full page rendering

If you have too much JS running on the page it is possible that PhantomJS will finish loading all resources before all JS ans Ajax calls finish.
Because of this, the rendered html returned from Prerender is incomplete.

Possible solutions:
You could add a wait timer, configurable through the url (and then through the middleware).
You could have wait function, but that would be more difficult. Something like this:
https://github.com/ariya/phantomjs/blob/master/examples/waitfor.js

GWT page is not rendered

I tried this

http://service.prerender.io/http://gwtp-carstore.appspot.com

but the page is not rendered. It seems that prerender does not render GWT pages.

phantom stdout: TypeError: 'undefined' is not a function (evaluating 'a')

I get the following issue when I try the following:

http://localhost:3000/http://www.arcbees.com/#!/product
getting http://www.arcbees.com/
phantom stdout: TypeError: 'undefined' is not a function (evaluating 'a')


phantom stdout:   http://www.arcbees.com/bootstrap/js/bootstrap.min.js:6
  http://www.arcbees.com/bootstrap/js/bootstrap.min.js:6

got 200 in 2949ms for http://www.arcbees.com/
getting http://www.arcbees.com/bootstrap/css/bootstrap.min.css
getting http://www.arcbees.com/rcarousel/widget/css/rcarousel.css
got 200 in 165ms for http://www.arcbees.com/rcarousel/widget/css/rcarousel.css
got 200 in 217ms for http://www.arcbees.com/bootstrap/css/bootstrap.min.css
getting favicon.ico
got 504 in 62ms for favicon.ico

Remove Script Tag's failing

I just recently put together the prerender on its own EC2 instance. Runs and works very well with prerendering pages. It saves the rendered HTML to S3 as well.

The problem I'm encountering comes when Prerender.io is attemping to match and remove the script tags and serve up the cached HTML content. I've added a snippet of the error I'm seeing when this happens. I can add in a gist with an entire HTML page for reference if need be.

Any help/guidance here would be appreciated!

...</body></html> has no method 'match'
    at Object.module.exports.beforeSend (/home/prerender/lib/plugins/remove-script-tags.js:7:50)
    at next (/home/prerender/lib/prerender.js:157:19)
    at next (/home/prerender/lib/prerender.js:159:13)
    at Object.prerender.pluginsBeforeSend (/home/prerender/lib/prerender.js:162:5)
    at Object.prerender.send (/home/prerender/lib/prerender.js:202:10)
    at Response.<anonymous> (/home/prerender/lib/plugins/s3-html-cache.js:19:21)
    at Request.<anonymous> (/home/prerender/node_modules/aws-sdk/lib/request.js:263:18)
    at Request.callListeners (/home/prerender/node_modules/aws-sdk/lib/sequential_executor.js:132:20)
    at Request.emit (/home/prerender/node_modules/aws-sdk/lib/sequential_executor.js:100:10)
    at Request.emitEvent (/home/prerender/node_modules/aws-sdk/lib/request.js:535:10)
[2013-11-15T01:16:22.780Z] (sys) Starting prerender
Server running on port 8443
starting phantom
started phantom

you can't start prerender in multiple ports

I tried running prerender in multiple ports but failed. I am seeing the below error (Address already in use)

events.js:72
    throw er; // Unhandled 'error' event
          ^
Error: listen EADDRINUSE
  at errnoException (net.js:901:11)
  at Server._listen2 (net.js:1039:14)
  at listen (net.js:1061:10)
  at Server.listen (net.js:1127:5)
  at Object.module.exports.create    (/home/paritsoh/nodejs_projects/prerender/node_modules/phantom/phantom.js:90:18)
  at Object.prerender.createPhantom       (/home/paritsoh/nodejs_projects/prerender/lib/prerender.js:22:13)
  at Object.prerender.createServer (/home/paritsoh/nodejs_projects/prerender/lib/prerender.js:11:10)
  at Object.<anonymous> (/home/paritsoh/nodejs_projects/prerender/index.js:6:11)
  at Module._compile (module.js:456:26)
  at Object.Module._extensions..js (module.js:474:10

Multi process not working on debian 6

Prerender on debian 6 server does not work with multi process.
For some reason after a few successful executions threads start hanging and do not return or respond any more.
Looking in the process table there are mutiple node processes which are not responding any more.

Disabling the forking of additional threads in index.js line 21 fixes the problem.

On Mac OsX and localhost the problem does not occur.

Wait for all AJAX calls to end before taking the HTML snapshot

Currently we wait 50ms to render the page. In most cases, this works fine after the phantomjs.open callback gets called...but we want to make sure everything is there.

A more robust way would be to count resources being requested using onResourceRequested and onResourceReceived and then take the snapshot after they're done.

Apache mod_rewrite?

Hi,

First of all, great work!

Sorry this is not really an issue, but you mention:

"It is also meant to be proxied through your server so that any relative links to things like CSS will work."

What kind of mod_rewrite rules would be necessary to accomplish this, if an AngularJS app is being served via Apache?

Any suggestions or comments will be highly appreciated.

Sails.js support.

Hi,
do you have any idea to make it work with Sails.js (http://sailsjs.org/) ?
How to require the prerender.io module and set prerenderServiceUrl or blacklisted options for example ?

thx

how to render relative images?

often relative images on a webpage show up as broken. is it possible to cache the images in prerender and display them or simply make the relative images into absolute urls. I guess I could do the latter with javascript.

Empty directory and file names in S3 cache

Not sure if I've got things set up incorrectly, but when I'm using the S3 cache my bucket contains empty directory names and empty (invisible) filenames.

e.g. if caching the website 'https://example.com' my S3 bucket hierarchy looks like this:

"https:" / "" / "example.com" / [invisible file here]

The caching still seems to work, but it appears that you are creating an empty directory for the "https://" portion of the URL. And the root document is stored as a filename with no name, which S3 then won't show? (This is a bit of a guess, I don't know that, but I know that it is working)

Seems like this folder structure should be:

"https:" / "example.com/" [dir]
         / "example.com" [file]

Like I said, the caching seems to work, but it makes it difficult to see what pages are cached.

window.prerenderReady doesn't seem to work

Some of my Angular.js pages requires extra ajax requests, so I added

window.prerenderReady = false;

to my index.html and set

window.prerenderReady = true;

in my angularjs controllers when ajax calls have completed. However, the resulting page still isn't cached properly even though the caching takes much longer time vs not setting window.prerenderReady.

If i access my pages directly, I can clearly see

window.prerenderReady

being set to false in the beginning. Before it's set to true, the bindings I intended have been bound to the html.

Could anyone help out?

code cleanup

Some methods are getting messy and doing too many things. Cleanup the code after the latest major version change.

Fix node and npm versions in package.json

In package.json:

"engines": {
    "node": "0.10.21",
    "npm": "1.3.11"
  }

This makes npm yell at you whenever you install prerender:
image

You probably want a range like:

"engines": {
  "node": ">=0.10.0",
  "npm": ">=1.3.0"
}

And unless you really need it, you probably shouldn't specify point releases. Also, you don't seem to be doing anything fancy with your package, so I don't think you need to specify an npm version.

Not running all JS on the page

Hi,
I am trying to render a backbone developed page through prerender. However i dont think the JS is being executed properly by PreRenderClient. The page gets rendered partially. I am not sure where its failing as it is not providing any error on log.

I understand PreRender works on the logic of counting the number of JS requests and counting it back. I tried increasing the JS timeout values at 5 Second level. Still it fails to load.


When i try to render the same using simple phantomjs it works fine. I Have pasted the code below.
I am trying to recognize the page load completion through identifying a div "botend". I Write out this div onto the page when all rendering is complete for the page.
Essentially the phantomjs process needs to wait for this div to appear on the page. It checks for the div avaliability every 1500 ms.

var page = require('webpage').create();
var system = require('system');
var pagePath = "This is the input url";
var htmlToRender;

// Open Site and wait for Page complete signal
page.open(pagePath, function (status) {
// Check for page load success
if (status !== "success") {
console.log("Unable to access network");
} else {
// Wait for page complete signal to be visible
waitFor(function() {
// Check in the page if a specific element is now visible
// Change in backbone framework to identify page load completion
return page.evaluate(function() {
return $("#botend").is(":visible");
});
}, function() {
htmlToRender = page.evaluate(function () {
return document.getElementsByTagName('html')[0].innerHTML
});
phantom.exit(0);
});
}
});

function waitFor(testFx, onReady, timeOutMillis) {
var maxtimeOutMillis = timeOutMillis ? timeOutMillis : 10000, //< Default Max Timout is 10s
start = new Date().getTime(),
condition = false,
interval = setInterval(function() {
if ( (new Date().getTime() - start < maxtimeOutMillis) && !condition ) {
// If not time-out yet and condition not yet fulfilled
condition = (typeof(testFx) === "string" ? eval(testFx) : testFx());
} else {
if(!condition) {
// If condition still not fulfilled (timeout but condition is 'false')
console.log("Wait timeout");
phantom.exit(1);
} else {
// Condition fulfilled (timeout and/or condition is 'true')
// Do what it's supposed to do once the condition is fulfilled
typeof(onReady) === "string" ? eval(onReady) : onReady();
clearInterval(interval); //< Stop this interval
}
}
}, 1500); //< repeat check every 1.5s
};

html encoding on redirect headers

So we are using "prerender-header" meta tag to pass redirects to prerender. We are having an encoding issue were the "&" being interpreted as "&amp" since it is going into html and eventually showing in the redirect url.
A possible workaround is to encode the url before setting the meta and decoding it in the http-headers.js.
Is it reasonable enough? what do you think?

Thanks,

Error with S3 cache

Hello,

First, thanks for your amazing work.

I have an error with S3 cache, when i enabled it, it doesn't work and i get this error :

{ [PermanentRedirect: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.]
message: 'The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.',
code: 'PermanentRedirect',
name: 'PermanentRedirect',
statusCode: 301,
retryable: false }

It seems to be a bug with the region of the bucket, i'm using a european bucket (eu-west-1). Is there a way to change the default region ?

Thanks

Search Engines Penalties.

This is a good approach that i almost implemented in a project, but got shot down due to concerns of Search Engines giving penalties for serving something different to them vs. what users got served.

Have you tried this on a production site and how did it went?

Server logs that it's listening on port before listen completes

server.listen is async, so PrerenderServer.start writes to stdout before the server is necessarily listening on the port:

// Starts the server
PrerenderServer.prototype.start = function(port) {
    port = port || process.env.PORT || 3000;
    this.phantom.start();
    http.createServer(_.bind(this.onRequest, this)).listen(port);
    console.log('Server running on port ' + port);
};

Not only is this technically disingenuous, but it means processes can't listen for stdout's data event to know that the server has started (a useful convention often used in the node world).

Support for facebook scraper

My website uses the HTML5 history api to handle routing. So there is no hashbang in the URL. It seems that in this case, the facebook scraper will not append ?_escaped_fragment_= even if I have <meta name="fragment" content="!">.

I noticed in the middleware there is some user agent checking for certain crawlers. Would it make sense to also include facebookexternalhit in the middleware?

For example, with nginx:

if ($http_user_agent ~* "baiduspider|twitterbot|facebookexternalhit") {
  set $prerender 1;
}

phantomjs web security

Hey guys,

In our project, we are calling an API that sends back some custom headers. The thing is, phantomjs is filtering these headers out.
The workaround is to create the phantom process with "--web-security=false".
Is it possible to add it in your next version? Or are there any implications? given that prerender should be run as a protected service

Thanks,

Crash when using caching

Hi,
First, thanks for your great job, you're awesome! :)

When I get a version newer than Sun Mar 2 17:42:41 2014 (1af5a07), prerender crashes when using any caching systems (I tried memory and S3 but it looks like they are all built the same way). If I disable all the caching systems, it works perfectly.
Here is the error:

stream.js:94
      throw er; // Unhandled stream error in pipe.
            ^
TypeError: Cannot read property 'url' of undefined
    at Object.module.exports.afterPhantomRequest (/opt/prerender2/lib/plugins/s3HtmlCache.js:28:39)
    at next (/opt/prerender2/lib/server.js:50:20)
    at next (/opt/prerender2/lib/server.js:52:13)
    at next (/opt/prerender2/lib/server.js:52:13)
    at next (/opt/prerender2/lib/server.js:52:13)
    at Object.server._pluginEvent (/opt/prerender2/lib/server.js:57:5)
    at Object.server.onPageEvaluate (/opt/prerender2/lib/server.js:331:14)
    at /opt/prerender2/lib/server.js:282:19
    at Proto.apply (/opt/prerender2/node_modules/phantom/node_modules/dnode/node_modules/dnode-protocol/index.js:123:13)
    at Proto.handle (/opt/prerender2/node_modules/phantom/node_modules/dnode/node_modules/dnode-protocol/index.js:99:19)
worker 1 died.

I tried to fix the error by passing values from phantom instead (phantom.prerender.url, phantom.prerender.documentHTML) but it looks like it's not working event if there is no more error... The pages are just not cached.

Can you help please? :)

EDIT: I found another ticket about the same issue with memory cache and I actually fixed it the right way. I just had another issue with my AWS keys.
I just leave you the ticket to remind you that you need to update the s3 module :)

httpHeaders plugin requires hack with angularJs

I'm using the httpHeaders plugin with an angularJs application with the goal of returning http 404 when a route doesnt match. My original thought was to do something like this:

<meta name="prerender-status-code" content="404" ng-if="is404">

This uses a boolean variable called is404. If is404==true then angularjs keeps this meta tag in the dom. If is404==false the angularjs removes this tag from the dom.

The issue is that the regex to check for prerender-status-code fails because of the ng-if="is404" in the tag. To get around this I've changed my code to this:

<style ng-if="is404">
  /*this is for prerenderIO*/
  /*<meta name="prerender-status-code" content="404">*/
</style>

This works but is a little hacky. I'm guessing this is specifically an angular issue though.

Leveraging prerender logic for other projects

The prerender client handles a lot of gnarly edge cases that would be useful for other projects. For example, I'm building a project right now that checks for broken links and redirects across an entire site. It's using phantomjs so that it can find javascript-generated links as well.

In order to leverage all the prerender logic, I'm creating a class that inherits from PrerenderClient and replaces javascriptToExecuteOnPage with my own logic to capture in-page links. This is a hack though. Prerender as it stands now hasn't been built to extend logic in completely different directions like this.

I don't think it's unreasonable to tell people who are building unrelated things like this to just copy-paste the prerender client logic and modify from there, so maybe nothing should be done. OTOH, prerender has a lot of follows now, so presumably the code will evolve pretty fast to handle new edge cases, and any copy-pasted code won't benefit. But then again, if modifications were made to allow prerender to be easily extensible, then we risk making the project too complex.

I'm not sure where to go with this. This is more of a scoping of whether this is possible / how it could be achieved.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.