Coder Social home page Coder Social logo

bosondata / chrome-prerender Goto Github PK

View Code? Open in Web Editor NEW
165.0 21.0 20.0 268 KB

Render JavaScript-rendered page as HTML/PDF/mhtml/png/jpeg using Headless Chrome

License: MIT License

Python 98.12% Dockerfile 1.88%
prerender server-rendering prerender-daemon

chrome-prerender's People

Contributors

laosb avatar messense avatar nyoroon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chrome-prerender's Issues

Invalid syntax error

I install prerender via pip3 install -U prerender and run prerender commmand in cli. I get the following error:

root@Prerender-3:~# prerender
Traceback (most recent call last):
  File "/usr/local/bin/prerender", line 7, in <module>
    from prerender.cli import main
  File "/usr/local/lib/python3.5/dist-packages/prerender/cli.py", line 8
    DEBUG: bool = os.environ.get('DEBUG', 'false').lower() in ('true', 'yes', '1')
         ^
SyntaxError: invalid syntax

Use layerPainted event to update _last_active_time may lead to timeout.

Currently, LayerTree.layerPainted event is used to update the _last_active_time variable.

But, as a common case, a blinking cursor in a focused textbox can lead to a continuous refreshing.

Thus, _wait_responses_ready will not work because the one second threshold won't be reached, which means a page without setting window.prerenderReady will wait until timeout.

Prerender stops rendering after some time under high load

Prerender stops rendering after some time (like 1-2 minutes) under high load (siege -c 10 for example).
All is fine when number of requests per second is low.
Chrome 59 stable. Prerender master.

app_1     | 2017-07-24 13:58:01,689 INFO websockets.protocol.fail_connection:618   Failing the WebSocket connection: 1006
app_1     | 2017-07-24 13:58:01,691 INFO prerender.chromerdp.close_page:53    Closing page 336a3e11-3e20-4c33-90fe-1663f413e48e: Target is closing
app_1     | 2017-07-24 13:58:01,695 INFO prerender.chromerdp.new_page:47    Created new page 41bc3e78-b7af-4178-bd9c-d81fcd3f6bf8
app_1     | 2017-07-24 13:58:01,783 ERROR prerender.prerender.render:72    Attach to Chrome page d7ab31b3-16e4-410e-97a9-6531ad1ad18c timed out, page is likely closed
app_1     | 2017-07-24 13:58:01,785 INFO websockets.protocol.fail_connection:618   Failing the WebSocket connection: 1006
app_1     | 2017-07-24 13:58:01,790 INFO prerender.chromerdp.close_page:53    Closing page d7ab31b3-16e4-410e-97a9-6531ad1ad18c: Target is closing
app_1     | 2017-07-24 13:58:01,795 INFO prerender.chromerdp.new_page:47    Created new page 9b9754b4-a420-4d0c-b911-9ad257199e25
app_1     | 2017-07-24 13:58:01,797 INFO prerender.prerender._manage_page:114   Page 41bc3e78-b7af-4178-bd9c-d81fcd3f6bf8 added to idle pages queue
app_1     | 2017-07-24 13:58:01,797 WARNING prerender.app.handle_request:201   Got 504 for http://*** in 26196ms
app_1     | 2017-07-24 13:58:01,898 INFO prerender.prerender._manage_page:114   Page 9b9754b4-a420-4d0c-b911-9ad257199e25 added to idle pages queue
app_1     | 2017-07-24 13:58:01,898 WARNING prerender.app.handle_request:201   Got 504 for http://*** in 26188ms
app_1     | 2017-07-24 13:58:01,934 ERROR prerender.prerender.render:72    Attach to Chrome page 9c27c0be-8209-4994-82cf-4bbe3046eff6 timed out, page is likely closed
app_1     | 2017-07-24 13:58:01,935 INFO websockets.protocol.fail_connection:618   Failing the WebSocket connection: 1006
app_1     | 2017-07-24 13:58:01,938 INFO prerender.chromerdp.close_page:53    Closing page 9c27c0be-8209-4994-82cf-4bbe3046eff6: Target is closing
app_1     | 2017-07-24 13:58:01,944 INFO prerender.chromerdp.new_page:47    Created new page c7e3f025-07ff-43f6-8429-822be2e54b6f
app_1     | 2017-07-24 13:58:01,988 ERROR prerender.prerender.render:72    Attach to Chrome page d24bed6a-3dd1-4aea-9898-d313d37976c7 timed out, page is likely closed
app_1     | 2017-07-24 13:58:01,990 INFO websockets.protocol.fail_connection:618   Failing the WebSocket connection: 1006
app_1     | 2017-07-24 13:58:01,993 INFO prerender.chromerdp.close_page:53    Closing page d24bed6a-3dd1-4aea-9898-d313d37976c7: Target is closing
app_1     | 2017-07-24 13:58:02,010 INFO prerender.chromerdp.new_page:47    Created new page a9c6464e-3e42-4607-a49f-efc7e2e377a5
app_1     | 2017-07-24 13:58:02,046 INFO prerender.prerender._manage_page:114   Page c7e3f025-07ff-43f6-8429-822be2e54b6f added to idle pages queue

Is it possible to use chrome-prerender as a squid parent in a proxy sandwich setup?

I would like to use chrome prerender in a proxy sandwich configuration (cache as much as possible), but squid as a client uses different GET requests. Ideas what to configure where, anyone?

Curling works fine:
[2017-12-28 17:33:27 +0100] - (sanic.access)[INFO][1:2]: GET http://127.0.0.1:3000/http://www.nytimes.com/ 200 446977
2017-12-28 17:33:27,944 INFO sanic.access.log_response:325

Squid fails:
[2017-12-28 17:34:06 +0100] - (sanic.access)[INFO][1:2]: GET http://www.nytimes.com/ 400 11
2017-12-28 17:34:06,510 INFO sanic.access.log_response:325
[2017-12-28 17:34:11 +0100] [23436] [INFO] KeepAlive Timeout. Closing connection.
2017-12-28 17:34:11,510 INFO root.keep_alive_timeout_callback:193 KeepAlive Timeout. Closing connection.

Subdirs in S3 storage is always same

Subdirs in S3 storage is always same. (first 4 chars of url is always http)

        hex_name = codecs.encode(url.encode('utf-8'), 'hex').decode('utf-8')
        sub_dir = os.path.join(hex_name[:2], hex_name[2:4])

Maybe make hash of URL or take URI instead of URL?

Chrome healthcheck

Hello!

I want to make chrome healthcheck from prerender, but how we can check it?
Requesting await render.render('chrome://version') fails with TooManyResponseError and about:blank just timeouts;

Install fail on Centos 7 using pip

Do i need to install python3 and/or something else?

[root@prerender ~]# pip install prerender
Collecting prerender
Using cached prerender-0.9.3.tar.gz
Collecting websockets (from prerender)
Using cached websockets-4.0.1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-_SAdD7/websockets/setup.py", line 11, in
with open(readme_file, encoding='utf-8') as f:
TypeError: 'encoding' is an invalid keyword argument for this function

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-_SAdD7/websockets/

asyncio.base_futures.InvalidStateError: invalid state

2017-05-09 10:05:27,361 ERROR asyncio.serve:450   Task exception was never retrieved
future: <Task finished coro=<Page._handle_response() done, defined at /home/prerender/venv/lib/python3.6/site-packages/prerender/chromerdp.py:152> exception=InvalidStateError('invalid state',) created at /home/prerender/venv/lib/python3.6/site-packages/prerender/chromerdp
source_traceback: Object created at (most recent call last):
  File "/home/prerender/venv/bin/prerender", line 11, in <module>
    load_entry_point('prerender==0.6.1', 'console_scripts', 'prerender')()
  File "/home/prerender/venv/lib/python3.6/site-packages/prerender/cli.py", line 15, in main
    app.run(host=HOST, port=PORT, debug=DEBUG)
  File "/home/prerender/venv/lib/python3.6/site-packages/sanic/app.py", line 559, in run
    serve(**server_settings)
  File "/home/prerender/venv/lib/python3.6/site-packages/sanic/server.py", line 450, in serve
    loop.run_forever()
  File "/usr/lib/python3.6/asyncio/coroutines.py", line 125, in send
    return self.gen.send(value)
  File "/home/prerender/venv/lib/python3.6/site-packages/prerender/chromerdp.py", line 248, in _wait
    asyncio.ensure_future(self._handle_response(format, obj, mhtml, future))
Traceback (most recent call last):
  File "/usr/lib/python3.6/asyncio/coroutines.py", line 125, in send
    return self.gen.send(value)
  File "/home/prerender/venv/lib/python3.6/site-packages/prerender/chromerdp.py", line 235, in _handle_response
    future.set_result(html)
asyncio.base_futures.InvalidStateError: invalid state

Get dockerized

Good job!

I wish you can publish this in a Docker image with the PR #3, so we can use this simply with Docker.

I'm expecting Headless Chrome in a separated container.

https problem on FreeBSD

I am running into issue when trying to render https based URL's. http works fine. Also chrome in headless can deal with https and spits out DOM, for example when I run: chrome --headless --disable-gpu --dump-dom https://www.apple.com

My prerender curl request is as follows:

curl -v http://myserver.com:8000/https://www.apple.com

Below is output from prerender:

[root@node295 ~]# prerender
2017-08-16 18:37:40 - (sanic)[DEBUG]:
\u2584\u2584\u2584\u2584\u2584
\u2580\u2580\u2580\u2588\u2588\u2588\u2588\u2588\u2588\u2584\u2584\u2584 _______________
\u2584\u2584\u2584\u2584\u2584 \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2584 /
\u2580\u2580\u2580\u2580\u2588\u2588\u2588\u2588\u2588\u258c \u2580\u2590\u2584 \u2580\u2590\u2588 | Gotta go fast! |
\u2580\u2580\u2588\u2588\u2588\u2588\u2588\u2584\u2584 \u2580\u2588\u2588\u2588\u2588\u2588\u2588\u2584\u2588\u2588 | _________________/
\u2580\u2584\u2584\u2584\u2584\u2584 \u2580\u2580\u2588\u2584\u2580\u2588\u2550\u2550\u2550\u2550\u2588\u2580 |/
\u2580\u2580\u2580\u2584 \u2580\u2580\u2588\u2588\u2588 \u2580 \u2584\u2584
\u2584\u2588\u2588\u2588\u2580\u2580\u2588\u2588\u2584\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2584 \u2584\u2580\u2580\u2580\u2580\u2580\u2580\u2588\u258c
\u2588\u2588\u2580\u2584\u2584\u2584\u2588\u2588\u2580\u2584\u2588\u2588\u2588\u2580 \u2580\u2580\u2588\u2588\u2588\u2588 \u2584\u2588\u2588
\u2584\u2580\u2580\u2580\u2584\u2588\u2588\u2584\u2580\u2580\u258c\u2588\u2588\u2588\u2588\u2592\u2592\u2592\u2592\u2592\u2592\u2588\u2588\u2588 \u258c\u2584\u2584\u2580
\u258c \u2590\u2580\u2588\u2588\u2588\u2588\u2590\u2588\u2588\u2588\u2592\u2592\u2592\u2592\u2592\u2590\u2588\u2588\u258c
\u2580\u2584\u2584\u2584\u2584\u2580 \u2580\u2580\u2588\u2588\u2588\u2588\u2592\u2592\u2592\u2592\u2584\u2588\u2588\u2580
\u2580\u2580\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2580
\u2584\u2584\u2588\u2588\u2580\u2588\u2588\u2588\u2588\u2588\u2588\u2580\u2588
\u2584\u2588\u2588\u2580 \u2580\u2580\u2580 \u2588
\u2584\u2588 \u2590\u258c
\u2584\u2584\u2584\u2584\u2588\u258c \u2580\u2588\u2584\u2584\u2584\u2584\u2580\u2580\u2584
\u258c \u2590 \u2580\u2580\u2584\u2584\u2584\u2580
\u2580\u2580\u2584\u2584\u2580

2017-08-16 18:37:40 - (sanic)[INFO]: Goin' Fast @ http://0.0.0.0:8000
2017-08-16 18:37:40,712 INFO prerender.chromerdp.new_page:47 Created new page 990b1656-05b1-4c93-9909-3299d31579b3
2017-08-16 18:37:40,723 INFO prerender.chromerdp.new_page:47 Created new page 083237b6-e7bf-4e44-97ea-a91312e636f1
2017-08-16 18:37:40,728 INFO prerender.chromerdp.new_page:47 Created new page b81c0ec4-e4a8-489c-a1be-ecd869aa861a
2017-08-16 18:37:40,735 INFO prerender.chromerdp.new_page:47 Created new page cf6d747f-d71f-4519-a74b-6a4f43e29353
2017-08-16 18:37:40,739 INFO prerender.chromerdp.new_page:47 Created new page 5272d7fd-44ed-45ff-b2c3-448fb214e29e
2017-08-16 18:37:40,747 INFO prerender.chromerdp.new_page:47 Created new page 5aac2038-452a-4b56-8e0b-a9a6fc03a8ed
2017-08-16 18:37:40,849 INFO prerender.chromerdp.new_page:47 Created new page 5d9d66ef-c68b-45b8-8e2a-711fa070a972
2017-08-16 18:37:40,863 INFO prerender.chromerdp.new_page:47 Created new page 5402ba0f-3ebd-4579-bc18-f20743deaf4e
2017-08-16 18:37:40,895 INFO prerender.chromerdp.new_page:47 Created new page ec42cd61-49ca-4319-82e4-42cffa53cd8f
2017-08-16 18:37:41,009 INFO prerender.chromerdp.new_page:47 Created new page 019aa698-ae0d-4e08-987c-63547ecd134d
2017-08-16 18:37:41,015 INFO prerender.chromerdp.new_page:47 Created new page 64d86146-6092-4b72-bbd4-c8af5e0acc66
2017-08-16 18:37:41,039 INFO prerender.chromerdp.new_page:47 Created new page fd4bb779-ee77-4ad3-8468-94348c2fda3c
2017-08-16 18:37:41,141 INFO prerender.chromerdp.new_page:47 Created new page 92221c66-8fdc-4494-a0d8-4e432fe805ee
2017-08-16 18:37:41,145 INFO prerender.chromerdp.new_page:47 Created new page 54c984c2-f04e-4b84-a267-7fb4babf9430
2017-08-16 18:37:41,198 INFO prerender.chromerdp.new_page:47 Created new page a6f6d228-4d64-4caf-906d-5ed701d019f4
2017-08-16 18:37:41,203 INFO prerender.chromerdp.new_page:47 Created new page 77111530-da40-48fc-9259-da08aa97972f
2017-08-16 18:37:41 - (sanic)[INFO]: Starting worker [95309]
2017-08-16 18:37:41,204 INFO sanic.serve:526 Starting worker [95309]
2017-08-16 18:42:07,854 INFO prerender.chromerdp.navigate:218 Page 990b1656-05b1-4c93-9909-3299d31579b3 [1] navigating to https://www.apple.com
2017-08-16 18:42:37,858 INFO websockets.protocol.fail_connection:618 Failing the WebSocket connection: 1006
2017-08-16 18:42:37,859 WARNING prerender.app._render:99 Temporary browser failure: , retry rendering https://www.apple.com in 1s
2017-08-16 18:42:37,871 ERROR asyncio.serve:527 Task exception was never retrieved
future: <Task finished coro=<Page._evaluate_prerender_ready() done, defined at /usr/local/lib/python3.6/site-packages/prerender/chromerdp.py:232> exception=AttributeError("'NoneType' object has no attribute 'send'",)>
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/prerender/chromerdp.py", line 234, in _evaluate_prerender_ready
res = await self.evaluate('window.prerenderReady == true')
File "/usr/local/lib/python3.6/site-packages/prerender/chromerdp.py", line 228, in evaluate
'params': {'expression': expr}
File "/usr/local/lib/python3.6/site-packages/prerender/chromerdp.py", line 178, in send
await self.websocket.send(json.dumps(payload))
AttributeError: 'NoneType' object has no attribute 'send'
2017-08-16 18:42:38,865 INFO prerender.chromerdp.navigate:218 Page 083237b6-e7bf-4e44-97ea-a91312e636f1 [1] navigating to https://www.apple.com
2017-08-16 18:43:07 - (sanic)[ERROR]: Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/sanic/server.py", line 143, in connection_timeout
raise RequestTimeout('Request Timeout')
sanic.exceptions.RequestTimeout: Request Timeout

2017-08-16 18:43:07,848 ERROR sanic.log:104 Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/sanic/server.py", line 143, in connection_timeout
raise RequestTimeout('Request Timeout')
sanic.exceptions.RequestTimeout: Request Timeout

2017-08-16 18:43:07,853 INFO websockets.protocol.fail_connection:618 Failing the WebSocket connection: 1006
2017-08-16 18:43:07,853 WARNING prerender.app.handle_request:201 Got 504 for https://www.apple.com in 60001ms
2017-08-16 18:43:07 - (sanic)[ERROR]: Connection lost before response written @ ('96.50.156.110', 18006)
2017-08-16 18:43:07,854 ERROR sanic.write_response:267 Connection lost before response written @ ('96.50.156.110', 18006)
2017-08-16 18:43:07,890 ERROR asyncio.serve:527 Task exception was never retrieved
future: <Task finished coro=<Page._evaluate_prerender_ready() done, defined at /usr/local/lib/python3.6/site-packages/prerender/chromerdp.py:232> exception=AttributeError("'NoneType' object has no attribute 'send'",)>
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/prerender/chromerdp.py", line 234, in _evaluate_prerender_ready
res = await self.evaluate('window.prerenderReady == true')
File "/usr/local/lib/python3.6/site-packages/prerender/chromerdp.py", line 228, in evaluate
'params': {'expression': expr}
File "/usr/local/lib/python3.6/site-packages/prerender/chromerdp.py", line 178, in send
await self.websocket.send(json.dumps(payload))
AttributeError: 'NoneType' object has no attribute 'send'

Close page on PRERENDER_TIMEOUT

Hello!
It's possible to close page on timeout if window.prerenderReady wasn't set to true from false in timeout time?
Now it's looks like they're like zombie.

app_1     | 2017-07-10 08:41:31,064 WARNING prerender.app._render:96    Temporary browser failure: No Chrome page available in 10s, retry rendering http://*** in 1s
app_1     | 2017-07-10 08:41:35,383 WARNING prerender.app._render:96    Temporary browser failure: No Chrome page available in 10s, retry rendering http://*** in 1s
app_1     | 2017-07-10 08:41:36,002 WARNING prerender.app._render:96    Temporary browser failure: No Chrome page available in 10s, retry rendering http://***in 1s
app_1     | 2017-07-10 08:41:36,304 WARNING prerender.app._render:96    Temporary browser failure: No Chrome page available in 10s, retry rendering http://*** in 1s
app_1     | 2017-07-10 08:41:36,954 WARNING prerender.app._render:96    Temporary browser failure: No Chrome page available in 10s, retry rendering http://*** in 1s

(Also accessing /browsers/list throws 504 when there is a lot of zombie pages)

httptools.parser.errors.HttpParserInvalidURLError: invalid URL

2017-05-17 23:38:33 - (sanic)[ERROR]: Traceback (most recent call last):
  File "/home/prerender/venv/lib/python3.6/site-packages/sanic/server.py", line 151, in data_received
    self.parser.feed_data(data)
  File "httptools/parser/parser.pyx", line 171, in httptools.parser.parser.HttpParser.feed_data (httptools/parser/parser.c:2721)
httptools.parser.errors.HttpParserInvalidURLError: invalid URL
2017-05-17 23:38:33,870 ERROR sanic.log:104   Traceback (most recent call last):
  File "/home/prerender/venv/lib/python3.6/site-packages/sanic/server.py", line 151, in data_received
    self.parser.feed_data(data)
  File "httptools/parser/parser.pyx", line 171, in httptools.parser.parser.HttpParser.feed_data (httptools/parser/parser.c:2721)
httptools.parser.errors.HttpParserInvalidURLError: invalid URL

Can't view generated mhtml in chrome

curl http://localhost:3000/mhtml/https://www.wikipedia.org/ > fromprerender.mhtml
google-chrome ./fromprerender.mhtml

Results in blank page. mhtml saved from chrome by normal means can be opened without any issues.

请问未来是否会支持限制截图范围

未来是否会支持限制截图范围,类似于Phantom.js中设置宽高。查看headless chrome文档,没有看到很好的python解决办法,似乎有nodejs版的解决办法

Useragent compatability with nodejs prerender

Current user agent is default chrome user agent.
Some configurations of web-servers require "Prerender" in user agent string to work correctly. It's possible to change user agent from chrome-prerender?

beforeSend plugin

Is there any option how to use something like beforeSend(req, res, next) plugin from NodeJS prerender or is there any plan to add this feature?

I need to modify both request and response because:

  • I need to disable cache
  • I need to add some meta tags
  • I need to modify headers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.