bitcrowd / chromic_pdf Goto Github PK

View Code? Open in Web Editor NEW

399.0 399.0 37.0 4.28 MB

Convenient HTML to PDF/A rendering library for Elixir based on Chrome & Ghostscript

License: Apache License 2.0

Elixir 95.95% PostScript 0.34% HTML 3.71%

chrome-devtools chrome-headless invoice pdf pdf-converter pdf-generation

chromic_pdf's Issues

Trigger Pdf Generation and Download the Pdf File on Browser via Button

This is probably a dumb question but i'll ask anyway.

My goal:
My users have multiple records, and each records on display has a download button. when user clicks the button it will generate the pdf containing specific record inside and triggers download on the browser.

Is there a way to do that using this. when trying to pipe this on my controller it only saves a file on the project folder. Thank you so much!!

UPDATE:

I was able to trigger browser upload by passing the base64 encoded pdf to an a tag:

/template.html.eex
   <% html = (render Web.SharedView, "_pdf_template.html",  assigns) %>
   <a href="data:application/pdf;base64,<%= render_pdf(html) %>" download>DOWNLOAD</a>


/render_view.ex
  def render_pdf(html) do
    ChromicPDF.print_to_pdf({:html, html})
    |> elem(1)
  end

PROBLEM:
Images inside the pdf file are all corrupted.

There should be a possibility to send cookies to Chrome.

I've already started this discussion on this elixir forum post.

What do you exactly mean by "users should be well aware of the security implications"?

In my opinion, having to wait a couple of seconds while the PDF is generated is totally fine (it is what happens when you try to export your document to a PDF format on Google Docs/Google Sheets for example).

I also like to print PDF from an url instead of a file/string because you do not need to write inline CSS (as far as I know the styles are not loaded when you print to PDF from an HTML file or when you render a view as a string in Phoenix).

Setting cookies is easily doable with the Ferrum library written in Ruby, as well as with Puppeteer. It'll be great if we could also set cookies in Chrome with Elixir, since, to the best of my knowledge, this feature is neither implemented in pdf-generator nor in puppeteer-pdf.

Fillable Forms

I've been searching around and don't really see support for fillable forms in the generated PDF, is this something that's possible to implement?

I've tried creating the forms manually but it's a huge p.i.t.b. trying to get everything aligned properly with the generated output, or even from scratch.

Would be a life-saver if we could peep form inputs and convert them to fillable fields.

Handle DOWN in terminate_worker/3

Hi,

We just started running into this issue in production:

2023-01-19T20:31:56.032465131Z 20:31:55.994 [error] Error during ChromicPDF.Browser.SessionPool.terminate_worker/3 callback:
2023-01-19T20:31:56.032514431Z ** (FunctionClauseError) no function clause matching in ChromicPDF.Browser.SessionPool.terminate_worker/3
2023-01-19T20:31:56.032520231Z (chromic_pdf 1.6.0) lib/chromic_pdf/pdf/browser/session_pool.ex:191: ChromicPDF.Browser.SessionPool.terminate_worker(:DOWN, %{session: %{session_id: "B4192E860C6DB1C83D3A4404A257EFD0", target_id: "B3EDCFD2C0E3D3349617B0A1F59AEC0F"}, uses: 14}, %{browser: #PID<0.2847.0>, init_timeout: 5000, max_session_uses: 1000, spawn_protocol: %ChromicPDF.Protocol{steps: [call: &ChromicPDF.SpawnSession.create_browser_context/2, await: &ChromicPDF.SpawnSession.browser_context_created/2, call: &ChromicPDF.SpawnSession.create_target/2, await: &ChromicPDF.SpawnSession.target_created/2, call: &ChromicPDF.SpawnSession.attach/2, await: &ChromicPDF.SpawnSession.attached/2, call: &ChromicPDF.SpawnSession.set_user_agent/2, call: &ChromicPDF.SpawnSession.offline_mode/2, call: &ChromicPDF.SpawnSession.enable_page/2, await: &ChromicPDF.SpawnSession.page_enabled/2, call: &ChromicPDF.ResetTarget.reset_history/2, await: &ChromicPDF.ResetTarget.history_reset/2, call: &ChromicPDF.ResetTarget.blank/2, await: &ChromicPDF.ResetTarget.blanked/2, await: &ChromicPDF.ResetTarget.fsl_after_blank/2, output: &ChromicPDF.SpawnSession.output/1], state: %{protocol: ChromicPDF.SpawnSession, chrome_args: "--disable-dev-shm-usage", discard_stderr: false, ignore_certificate_errors: false, no_sandbox: true, offline: true, on_demand: false}}, timeout: 5000})
2023-01-19T20:31:56.032529731Z (nimble_pool 0.2.6) lib/nimble_pool.ex:932: NimblePool.do_apply_worker_callback/4
2023-01-19T20:31:56.032532831Z (nimble_pool 0.2.6) lib/nimble_pool.ex:867: NimblePool.maybe_terminate_worker/3
2023-01-19T20:31:56.032536031Z (nimble_pool 0.2.6) lib/nimble_pool.ex:769: NimblePool.remove_worker/3
2023-01-19T20:31:56.032539231Z (nimble_pool 0.2.6) lib/nimble_pool.ex:651: NimblePool.cancel_request_ref/3
2023-01-19T20:31:56.032542331Z (stdlib 4.0.1) gen_server.erl:1120: :gen_server.try_dispatch/4
2023-01-19T20:31:56.032545331Z (stdlib 4.0.1) gen_server.erl:1197: :gen_server.handle_msg/6
2023-01-19T20:31:56.032549331Z (stdlib 4.0.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3

I'm not sure if it's related but we recently upgraded to ChromicPDF 1.6.0. I'm not sure how to diagnose from here. If it's any help, this is when trying to generate about 15 PDFs at the same time. This error is not happening locally for the same set of data, only on our production server.

Fonts in html not visible in PDF

Hello,
thanks for the work.
I have html code that I'd like to convert to pdf. It works well but fonts does not seem to be rendered.. instead I just see the default one (times new roman i guess). Also Font-Awesome icons are not visible.

Code for the job (inside my controller):

...
 Phoenix.View.render_to_string(MyAppWeb.MenuView, "menu-page.print.html", merged_assigns)
          |> prepare_menu_pdf()
          |> ChromicPDF.print_to_pdf(
               output: fn path ->
                 conn
                 |> send_download({:file, path}, filename: merged_assigns.filename <> ".pdf", disposition: :inline)
               end
             )
...

and prepare_menu_pdf():

defp prepare_menu_pdf(string_content) when is_binary(string_content) do
    {:ok, styles} =
      Path.join(:code.priv_dir(:my_app_web), "static/css/menu-print-styles.css")
      |> File.read()

    {:ok, font_styles} =
      Path.join(:code.priv_dir(:my_app_web), "static/fonts/fira/fira.css")
      |> File.read()

    [
      content: [
        "<style>" <> styles <> "</style>",
        "<style>" <> font_styles <> "</style>",
        string_content
      ]
    ]
    |> ChromicPDF.Template.source_and_options()
  end

I have tried to download woff files mentioned in fira/fira.css (where those @font-family-ies are specified) so it can be available locally but no luck
FontAwesome is inserted with remote url..

Am I doing anything wrong?

Thanks in advance.

External URLs in header/footer templates crash

Get rid of integration env

Possibly incompatible with Ghostscript 9.56.1

See https://elixirforum.com/t/chromicpdf-pdf-generator/29473/41

** (RuntimeError)   /usr/local/bin/gs exited with status 1!

GPL Ghostscript 9.56.1: Unrecoverable error, exit code 1


    (chromic_pdf 1.2.0) lib/chromic_pdf/utils.ex:53: ChromicPDF.Utils.system_cmd!/3

Haven't confirmed it yet.

Refactor API and deprecate print_to_pdfa/2

Now with the "multiple sources" feature being in place, it becomes apparent that the print_to_pdf/2 / print_to_pdfa/2 separation wasn't a good call. Refactor as follows:

Make PDF/A conversion an optional step of print_to_pdf/2, dependent on presence of new pdfa: true flag
Deprecate print_to_pdfa/2 (route it to print_to_pdf/2 with pdfa: true)

Caching is broken for Alpine CI runs

We get warnings in Github actions for both alpine-based jobs
Github's actions/cache@v3 apparently uses the host's tar command to tar up the cache
Should be easily fixable by installing GNU tar in the Alpine images

Get number of schedulers at runtime

Getting the schedulers at compile time doesn't make sense. Also probably we want to set it to a minimum of 1.

See #100 (comment)

timeouts when running in github actions CI on ubuntu-latest

Hi. Thanks for this awesome library. In general, it works really well. We have no issues in development.

I tried to get ChromicPDF working in github actions CI and am getting this error/warning:

Error: t it renders a pdf blob [L#12]08:58:31.509 [error] Task #PID<0.627.0> started from #PID<0.626.0> terminating
Warning: ** (RuntimeError) Timeout in Channel.run_protocol/3!
The underlying GenServer.call/3 exited with a timeout. This happens when the browser was
not able to complete the current operation (= PDF print job) within the configured
5000 milliseconds.

If you are printing large PDFs and expect long processing times, please consult the
documentation for the `timeout` option of the session pool.

If you are *not* printing large PDFs but your print jobs still time out, this is likely a
bug in ChromicPDF. Please open an issue on the issue tracker.

    (chromic_pdf 1.2.2) lib/chromic_pdf/pdf/browser/channel.ex:24: ChromicPDF.Browser.Channel.run_protocol/3
Warning:     (chromic_pdf 1.2.2) lib/chromic_pdf/pdf/browser/session_pool.ex:120: [481](https://github.com/westarete/novo/runs/8127312057?check_suite_focus=true#step:9:482)
    (elixir 1.13.4) lib/task/supervised.ex:89: Task.Supervised.invoke_mfa/2
Warning:     (elixir 1.13.4) lib/task/supervised.ex:34: Task.Supervised.reply/4
    (stdlib 3.17.2) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Warning: Function: #Function<0.14377584/0 in ChromicPDF.Browser.SessionPool.init_worker/1>
    Args: []

I'm using these options to configure Chromic:

  on_demand: false,
  offline: true,
  discard_stderr: false,
  no_sandbox: true,
  session_pool: [timeout: 30_000]

In development, rendering takes around half a second and that's with on_demand set to true. This is for a one page PDF that's mostly just text.

I'm using an ubuntu-latest github actions runner. The runner image includes google chrome, chromium and chrome driver so I did not do anything to download another version of chromium, etc.. When checking which version Chromic was using, it looks like it selected "/usr/bin/chromium-browser" and I don't get any errors on boot about not being able to find chromium.

Of note, when I SSH into the runner and try running the test suite manually, it passes without any issues. I also tried rendering a pdf manually in IEx on the runner image and I got a blob back almost instantly. But for some reason, as part of CI, I'm getting flaky tests that take a long time to run and sometimes timeout. Sometimes the tests are marked as "passed" even when I see these errors. Other times, I see these errors and the test suite is marked as failed. I can't discern any difference in output between those two though.

Here's the test I'm running:

  test "it renders a pdf blob" do
    student = StudentsFixtures.student_fixture()
    student = LegacyData.get_student_for_transcript!(student.id)
    transcript = Transcript.get_transcript(student.id)

    assert {:ok, _} = TranscriptPDFRenderer.render(student, transcript)
  end

Here's the render function:

  def render(student, transcript) do
    opts =
      options(
        content: content(student, transcript),
        header: header(student),
        footer: footer()
      )

    with {:ok, data} <- ChromicPDF.print_to_pdf(opts),
         {:ok, binary} <- Base.decode64(data) do
      {:ok, binary}
    else
      error ->
        {:error, error}
    end
  end

  defp options(opts) do
    [
      size: :us_letter,
      header_height: "75mm",
      footer_height: "20mm"
    ]
    |> Keyword.merge(opts)
    |> ChromicPDF.Template.source_and_options()
  end

# ... snip

I've fiddled with this for many hours and can't seem to get a configuration that is reliable on github actions CI. Any thoughts?

Telemetry support

High CPU usage

Hey,
I recently switched to this library for pdf generation from wkhtmltopdf, and the cpu usage went up from 5-10% generally to 60-80+%.

Did I miss something?

Any tips?

When using chromic_pdf in production, chromium is often crashing

chromic_pdf is working in production but a few times a day we get errors like below. We are using chromium. We are not using sandbox mode and none of the page assets are loaded via network. They are all embeded via base64. If you have any insights or debugging tips, let me know.

GenServer ChromicPDF.Browser terminating
** (FunctionClauseError) no function clause matching in ChromicPDF.Browser.handle_info/2
(chromic_pdf 0.5.2) lib/chromic_pdf/pdf/browser.ex:74: ChromicPDF.Browser.handle_info({:EXIT, #PID<0.7925.0>, :chrome_has_crashed}, %{dispatch: #Function<1.51776475/1 in ChromicPDF.Browser.init/1>, protocols: []})
(stdlib 3.13) gen_server.erl:680: :gen_server.try_dispatch/4
(stdlib 3.13) gen_server.erl:756: :gen_server.handle_msg/6
(stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message: {:EXIT, #PID<0.7925.0>, :chrome_has_crashed}

Wait for dynamic content to be ready before printing

We noticed that in some cases it is useful to delay the printing until dynamic content on the page is ready. This is admittedly rare but sometimes handy. The way we've approached it is to wait until a specific element has a defined attribute set.

Does the approach sound sane and would it make sense to include it in Chromic PDF? See the PR #87 for the approach.

Default to "online" mode

Parallel printing for multiple sources

Just adding this as an issue, to not get lost :)

#194 (comment)

Implementation ideas:

Simple but meh: Spawn a task for each and await them all
Reasonable but expensive: Rework the session pool to yield a session and add a way to enqueue multiple protocols from a single process (so replace GenServer.call in Channel)

Integrating with paged.js

Hi!
Currently we are using paged.js (pagedjs-cli) to generate pdfs using https://www.pagedjs.org/ since it supports a lot of new css to control the output that chrome doesn't support natively.

However, it's pretty slow to start a new node/chrome instance each time so I'm looking for alternatives.

I tried to instead using the pagedjs polyfill js in the web page and the following option (since that node is generated by pagedjs)
wait_for: %{selector: ".pagedjs_pages", attribute: "style"}

However, it seems it does not get a reply and timeouts. I guess this is because the node with class=".pagedjs_pages" does not exist at the start.

The corresponding code in pagedjs-cli is here: https://gitlab.pagedmedia.org/tools/pagedjs-cli/blob/master/src/printer.js#L220
await page.waitForSelector(".pagedjs_pages");

Any help would be much appreciated!

`wait_for` option not working

Hello! First of all, thanks for the awesome work!

I am unable to display font-awesome icons in the exported pdf. In a normal website lifecycle, the font-awesome icon will display after some time. That is why I thought the wait_for option will be of great use. But it seems it always times out and can't find whatever selector and attr I set. This is how I set it up:

Version: 0.7.1

config.exs

config :my_app, ChromicPDF, on_demand: false, session_pool: [timeout: 60_000], offline: false

my_pdf_template.html.eex

<head>
  <link rel="stylesheet" href="<%= Routes.static_url(@conn, "/css/tailwind.css") %>">
</head>

<body class="bg-white">
  <div id="print-ready"></div>
  <i class="far fa-envelope"></i>
  ... more contents ...

  <script defer src="<%= Routes.static_url(@conn, "/js/app.js") %>" ></script>
</body>

app.js

import { faEnvelope } from "@fortawesome/pro-regular-svg-icons";
import { dom, library } from "@fortawesome/fontawesome-svg-core";

function handleDOMContentLoaded() {
  library.add(faEnvelope);
  dom.watch();
  $("#print-ready").attr("ready-to-print", "");
}

window.addEventListener("DOMContentLoaded", handleDOMContentLoaded, false);

Then I just call an endpoint that will execute the download of the pdf via a controller:
pdf_controller.ex

def export(conn, params) do
... some code to build assigns ...
template = Phoenix.HTML.safe_to_string(PdfView.render("my_pdf_template.html", assigns))

[content: template]
|> ChromicPDF.Template.source_and_options()
|> ChromicPDF.print_to_pdf(
    wait_for: %{selector: "#print-ready", attribute: "ready-to-print"},
    output: fn path ->
      conn
      |> put_resp_content_type("application/pdf")
      |> send_download({:file, path}, filename: "export.pdf")
      |> halt()
    end
  )

end

Even I add an inline script in the template that adds the attribute ready-to-print it seems it still cannot find the element and eventually times out.

Without wait_for the pdf can be downloaded successfully, although the font-awesome icons are not visible.

Generating Large PDFs causes Timeout

Hello,

I'm attempting to generate a large pdf (100 pages) however the GenServer times out after only 5 (!) seconds. Is there a way to override this? I don't see a supervisor option to do so, and I can't seem to track down the timeout in the source.

Regards,
Dakora

Compile time debug option to log calls and Chrome messages

From @jarimatti in #104

Would be nice if we could log the DevTools protocol messages for diagnostics via some means, e.g. config flag or some other setting: the messages should shed some light into this. Not sure if Chrome can do that natively?

Agreed. I'm also constantly going back to the Channel and Connection to put in IO.inspects. I suggest adding a compile time switch that conditionally compiles in some debug code, either Logger.debug or just direct IO.inspects.

ChromicPDF keeps crashing on Gigalixir

Hi, could you please help me with this one. I'm trying to use your library on a free Gigalixir tier but it keeps failing on an application startup with a following error:

 ** (RuntimeError) Timeout in Channel.run_protocol/3!
 16:53:07.958 [error] Task #PID<0.4273.0> started from #PID<0.4239.0> terminating
 
 The underlying GenServer.call/3 exited with a timeout. This happens when the browser was
 not able to complete the current operation (= PDF print job) within the configured
 If you are printing large PDFs and expect long processing times, please consult the
 
 5000 milliseconds.
 
 documentation for the `timeout` option of the session pool.
 bug in ChromicPDF. Please open an issue on the issue tracker.
 If you are *not* printing large PDFs but your print jobs still time out, this is likely a
 
     (chromic_pdf 1.1.0) lib/chromic_pdf/pdf/browser/session_pool.ex:110: ChromicPDF.Browser.SessionPool.do_init_worker/2
     (chromic_pdf 1.1.0) lib/chromic_pdf/pdf/browser/channel.ex:24: ChromicPDF.Browser.Channel.run_protocol/3
     (elixir 1.12.1) lib/task/supervised.ex:35: Task.Supervised.reply/5
     (elixir 1.12.1) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
     (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
 Function: #Function<0.33738222/0 in ChromicPDF.Browser.SessionPool.init_worker/1>
     Args: []
 [0824/165309.842598:ERROR:zygote_host_impl_linux.cc(263)] Failed to adjust OOM score of renderer with pid 1731: Permission denied (13)
 16:53:13.065 [error] Task #PID<0.4275.0> started from #PID<0.4239.0> terminating

I'm using mix releases deployment option to Gigalixir. Here is my config:

env.sh.eex

#!/bin/sh

apt-get update
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
apt-get -y install ./google-chrome-stable_current_amd64.deb

application.ex

defmodule Flip.Application do
  # See https://hexdocs.pm/elixir/Application.html
  # for more information on OTP Applications
  @moduledoc false


  use Application

  def start(_type, _args) do
    children = [
      # Start the Ecto repository
      Flip.Repo,
      # Start the Telemetry supervisor
      FlipWeb.Telemetry,
      # Start the PubSub system
      {Phoenix.PubSub, name: Flip.PubSub},
      # Start the Endpoint (http/https)
      FlipWeb.Endpoint,
      # Start a worker by calling: Flip.Worker.start_link(arg)
      # {Flip.Worker, arg}
      {ChromicPDF, no_sandbox: true, discard_stderr: false, session_pool: [timeout: 10_000]}
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: Flip.Supervisor]
    Supervisor.start_link(children, opts)
  end

  # Tell Phoenix to update the endpoint configuration
  # whenever the application is updated.
  def config_change(changed, _new, removed) do
    FlipWeb.Endpoint.config_change(changed, removed)
    :ok
  end
end

So basically, I use env.sh.eex to download and install chrome and setup no_sandbox as Gigalixir uses Docker. Moreover sometimes I get no errors after deployment and everything works perfectly. But most of the times it keeps failing. I have no idea why this happens, could you have a look please?

Add more absolute paths to Chrome

https://community.fly.io/t/cant-install-chrome-chromium-via-dockerfile-or-ssh/5303/5

Not sure how this happened ^^, theoretically System.find_executable/1 should be able to pick up the /usr/bin/google-chrome path from $PATH 🤷

No accessibility tags in Ghostscript-generated files

Hey! Sorry for not taking a look at this when the PR was originally in review but there are a few things you should know related to merging chromic_pdf generated documents with ghostscript:

it's really slow. To quote an ex-coworker

In my testing, I found that, with a large PDF file (~200 pages), GhostScript took 12 seconds to handle the request, and pdfunite took... 4.47 seconds! This should therefore speed up merging PDFs significantly. Turns out, ghostscript is a significant and slow bottleneck.

it removes the accessibility annotations added via --export-tagged-pdf. This seems like something that you don't want happening implicitly

chromic_pdf/lib/chromic_pdf/pdf/chrome_runner.ex

Line 109 in f8d7a39

"--export-tagged-pdf",

Use setup-beam instead of custom image in lint job

override default NimblePool checkout timeout

Should we allow users to override the default timeout of 5 seconds for NimblePool.checkout/4? It seems like there's now way to do this currently (we still sometimes have some ci failing because of this now 🙂 )

Support Stream

Thanks for the awesome job
Really useful lib

I'm bringing here what we discussed on the forum:

Sometimes I need to print very large pdf files and the memory consumption jumps too high

It would be nice if we could support transferMode as ReturnAsStream

(I've tried to look at the code but, because of my lack of experience in elixir, I could not find an easy way to implement IO.read/IO.close in the current macro schema)

Mix.Shell hangs when supervisor is started

Problem

We run our umbrella project's test suite with mix cmd mix test. ChromicPDF is started as child of an umbrella apps' supervisor automatically. The process hangs indefinitely when mix test is finished.

Diagnose

Mix.Tasks.Cmd opens a port for the nested command and waits for an :EXIT message for the port in a receive block in Mix.Shell.cmd/3.
Problem was introduced in 6f41cd4 which is definitely related, but it's not obvious what exactly caused this change in behaviour.

How to reproduce

Since ChromicPDF's test suite starts it with ExUnit's start_supervised, mix cmd mix test succeeds in this project. However, simply starting the supervisor with mix run breaks.

mix cmd mix run -e "\"ChromicPDF.start_link()\""
# hangs forever...

iex(1)> Mix.Shell.cmd("mix run -e \"ChromicPDF.start_link()\"", [], &IO.puts(&1))
# hangs forever...

Expected behaviour

$ mix cmd mix run -e "\"Agent.start_link(fn -> nil end)\""
# exits immediately with exit code 0

Option to ignore certificate errors

Hi again!

In puppeteer you can pass ignoreHTTPSErrors: true to the launch config which sends the command
'Security.setIgnoreCertificateErrors', {ignore: true}.

It would be great to have this option in chromic_pdf too.

Generate landscape PDF

Hi there,

I'm currently struggling at generating a PDF from an url in landscape mode.
I tried with the following options:

ChromicPDF.print_to_pdf(
  %{
    source: {:url, "http://acme.com/pdf"},
    opts: [
      print_to_pdf: %{landscape: true}
    ]
  }
)

ChromicPDF.print_to_pdf(
  %{
    source: {:url, "http://acme.com/pdf"},
    opts: [
      print_to_pdf: %{paperWidth: 11, paperHeight: 8.5}
    ]
  }
)

None is working. Any idea?

Zugferd test is broken for SAFER ghostscript

Warnings when running integration specs

Maybe we want to deal with those:

08:16:15.577 [info]  Function passed as a handler with ID "print_to_pdf_start" is local function.
This mean that it is either anonymous function or capture of function without module specified. That may cause performance penalty when calling such handler. For more details see note in `telemetry:attach/4` documentation.

https://hexdocs.pm/telemetry/telemetry.html#attach-4

08:16:15.577 [info]  Function passed as a handler with ID "print_to_pdf_stop" is local function.
This mean that it is either anonymous function or capture of function without module specified. That may cause performance penalty when calling such handler. For more details see note in `telemetry:attach/4` documentation.

https://hexdocs.pm/telemetry/telemetry.html#attach-4
.
08:16:16.047 [info]  Function passed as a handler with ID "convert_to_pdfa_start" is local function.
This mean that it is either anonymous function or capture of function without module specified. That may cause performance penalty when calling such handler. For more details see note in `telemetry:attach/4` documentation.

https://hexdocs.pm/telemetry/telemetry.html#attach-4

08:16:16.047 [info]  Function passed as a handler with ID "convert_to_pdfa_stop" is local function.
This mean that it is either anonymous function or capture of function without module specified. That may cause performance penalty when calling such handler. For more details see note in `telemetry:attach/4` documentation.

https://hexdocs.pm/telemetry/telemetry.html#attach-4


.
08:16:16.246 [info]  Function passed as a handler with ID "convert_to_pdfa_start" is local function.
This mean that it is either anonymous function or capture of function without module specified. That may cause performance penalty when calling such handler. For more details see note in `telemetry:attach/4` documentation.

https://hexdocs.pm/telemetry/telemetry.html#attach-4

08:16:16.246 [info]  Function passed as a handler with ID "convert_to_pdfa_stop" is local function.
This mean that it is either anonymous function or capture of function without module specified. That may cause performance penalty when calling such handler. For more details see note in `telemetry:attach/4` documentation.

https://hexdocs.pm/telemetry/telemetry.html#attach-4

08:16:16.246 [info]  Function passed as a handler with ID "print_to_pdf_start" is local function.
This mean that it is either anonymous function or capture of function without module specified. That may cause performance penalty when calling such handler. For more details see note in `telemetry:attach/4` documentation.

https://hexdocs.pm/telemetry/telemetry.html#attach-4

08:16:16.246 [info]  Function passed as a handler with ID "print_to_pdf_stop" is local function.
This mean that it is either anonymous function or capture of function without module specified. That may cause performance penalty when calling such handler. For more details see note in `telemetry:attach/4` documentation.

https://hexdocs.pm/telemetry/telemetry.html#attach-4
.
08:16:16.715 [info]  Function passed as a handler with ID "capture_screenshot_start" is local function.
This mean that it is either anonymous function or capture of function without module specified. That may cause performance penalty when calling such handler. For more details see note in `telemetry:attach/4` documentation.

https://hexdocs.pm/telemetry/telemetry.html#attach-4

08:16:16.716 [info]  Function passed as a handler with ID "capture_screenshot_stop" is local function.
This mean that it is either anonymous function or capture of function without module specified. That may cause performance penalty when calling such handler. For more details see note in `telemetry:attach/4` documentation.

https://hexdocs.pm/telemetry/telemetry.html#attach-4

Flaky test assertion on log output

Ever since our move to Github Actions

Rendering a combined PDF of multiple "files"

This is more of a question I suppose. I currently generate a PDF for a specific user based on their account information.

I have the need to generate a single PDF of a group of these individual PDFs for archival purposes. Currently I'm rendering them individually to disk and then concatenating them via a System.cmd call to qpdf.

I thought about rendering one long PDF but I'm currently using both the header (for user profile info) and footer (for page numbers) to generate the pdf so I don't believe that'll work since headers and footers span the entire document.

Is there any way for me to leverage chromicPDF to generate this combined file directly?

Custom options to chromium

Hi again!

We need to pass a custom option to chromium on startup, in this case we need "--font-render-hinting=none". I don't see an immediate way to do this.

See puppeteer/puppeteer#2410 for why we need font-render-hinting=none. In any case, it would be useful to be able to pass custom options.

Option to ignore certificate errors

Hi again!

In puppeteer you can pass ignoreHTTPSErrors: true to the launch config which sends the command
'Security.setIgnoreCertificateErrors', {ignore: true}.

It would be great to have this option in chromic_pdf too.

Add info option to print_to_pdf/2

Now that we're calling Ghostscript from print_to_pdf/2 (optionally, for multiple sources), we may as well benefit from it.

Make the embedding of metadata pieces a separate logic in Ghostscript*
- While you're at it, refactor the GhostscriptWorker/Interface/Impl mess
Move the info_option to pdf_option and add it as optional flag to print_to_pdf/2
- Mention in the docs that this requires ghostscript
- And writing the file to disk (mention this for the "join multiple sources" section as well)

Images using an absolute path (both file or remote URL) don't work with HTML source

When specifying the HTML content directly to the print_to_pdf, the rendered PDF doesn't contain the image. It works ok when specifying the image through base64. Going through the code, it seems that when in :html mode, there is no wait for the Page.frameStoppedLoading notification like it is for the :url mode.

chromic_pdf/lib/chromic_pdf/pdf/protocols/print_to_pdf.ex

Lines 13 to 34 in 29a23ac

    
           if_option {:source_type, :html} do 
        
             call(:get_frame_tree, "Page.getFrameTree", [], %{}) 
        
             await_response(:frame_tree, [{["frameTree", "frame", "id"], "frameId"}]) 
        
             call(:set_content, "Page.setDocumentContent", [:html, "frameId"], %{}) 
        
             await_response(:content_set, []) 
        
           end 
        
           if_option {:source_type, :url} do 
        
             call(:navigate, "Page.navigate", [:url], %{}) 
        
             await_response(:navigated, ["frameId"]) do 
        
               case get_in(msg, ["result", "errorText"]) do 
        
                 nil -> 
        
                   :ok 
        
                 error -> 
        
                   {:error, error} 
        
               end 
        
             end 
        
             await_notification(:frame_stopped_loading, "Page.frameStoppedLoading", ["frameId"], []) 
        
           end

Reproduction:

path = System.tmp_dir() |> Path.join("output.pdf")

result = ChromicPDF.Template.source_and_options(content: content()) |> ChromicPDF.print_to_pdf(output: path)
IO.inspect(result)

defp content() do
  :erlang.iolist_to_binary(["<img src=\"https://homepages.cae.wisc.edu/~ece533/images/peppers.png\">"])
end

Unable to view images in rendered pdf

Hey, thanks for creating this project. I like how it works almost right out of the box.

I'm in the middle of testing, and the html file I want to convert to a pdf contains a single image.

[content: [ "<div><img src=\"potato.png\"></div>]]
|> ChromicPDF.Template.source_and_options()
|> ChromicPDF.print_to_pdf(output: "lib/pdftest_web/templates/pdf_templates/potatoResult.pdf")

The potato.png file is located in the outer most phoenix directory (next to /lib, /priv, etc..) If I run that in the IEx terminal, the resulting pdf is empty.

However, if I put some other html as input, it functions normally.

await_response not failing when payload does not contain key

See stacktrace at #104

await_response and await_notification both call extract_from_payload
extract_from_payload doesn't error when it can't find the desired key in the payload, but instead sets it to nil in the state
When the value is to be used again in a follow-up call, get_in_state! errors out when the value is nil.
FIX: error out when await_response or await_notification can't satisfy their payload extractions

:connection_timeout random burst

Hello, i'm getting a rather odd error spam out of the blue here. About 5000 error events popped up in sentry after running an update, which resolved with a reboot:

11:30:31.085 [error] Task #PID<0.2538.0> started from #PID<0.2530.0> terminating
** (stop) exited in: GenServer.call(#PID<0.2525.0>, {:run_protocol, %ChromicPDF.Protocol{result_fun: nil, state: %{ignore_certificate_errors: false, offline: true, session_pool: [size: 10, timeout: 10000]}, steps: [call: &ChromicPDF.SpawnSession.create_browser_context/2, await: &ChromicPDF.SpawnSession.browser_context_created/2, call: &ChromicPDF.SpawnSession.create_target/2, await: &ChromicPDF.SpawnSession.target_created/2, call: &ChromicPDF.SpawnSession.attach/2, await: &ChromicPDF.SpawnSession.attached/2, call: &ChromicPDF.SpawnSession.set_user_agent/2, call: &ChromicPDF.SpawnSession.offline_mode/2, call: &ChromicPDF.SpawnSession.enable_page/2, await: &ChromicPDF.SpawnSession.page_enabled/2, call: &ChromicPDF.SpawnSession.blank/2, await: &ChromicPDF.SpawnSession.blanked/2, await: &ChromicPDF.SpawnSession.fsl_after_blank/2, output: &ChromicPDF.SpawnSession.output/1]}}, 5000)
    ** (EXIT) :connection_terminated
    (elixir 1.11.4) lib/gen_server.ex:1027: GenServer.call/3
    (chromic_pdf 0.7.2) lib/chromic_pdf/pdf/browser/channel.ex:21: ChromicPDF.Browser.Channel.run_protocol/3
    (chromic_pdf 0.7.2) lib/chromic_pdf/pdf/browser/session_pool.ex:110: ChromicPDF.Browser.SessionPool.do_init_worker/2
    (elixir 1.11.4) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
    (elixir 1.11.4) lib/task/supervised.ex:35: Task.Supervised.reply/5
    (stdlib 3.10) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Function: #Function<0.13638364/0 in ChromicPDF.Browser.SessionPool.init_worker/1>
    Args: []
 
11:30:31.209 [error] GenServer #PID<0.2550.0> terminating
** (stop) :connection_terminated
Last message: {:EXIT, #Port<0.63>, :normal}

It's weird because chromium wasn't updated D:

Perhaps the reconnect frequency could be toned down? I got 100s of these errors every second.

Cleanly handle page overflow of a `<table>`

I am generating a pdf with a long table that often overflows into a second page and I am not sure how I should go about making the transition clean. Ideally, my table would be cut where a new th header starts.

Is this something I can do?

Test against a build matrix

With #153 we can see that it is sometimes useful to test not only against one but multiple versions of chrome, elixir, erlang & ghostscript. Maybe we can use a build matrix in the CI, so new versions can easily be added.

We could also use multiple versions of alpine images, since the chrome and ghostscript parts are pretty separate and there is always an update in major versions in a new alpine major version for chrome (and often ghostscript as well) and they reflect what is state of the art pretty well.

Replace `wait_for` with script evaluation

Hi @jarimatti ,

posting this here so we can discuss it and you feel involved and not replaced (because replacing is what I intend to do 😄 ):

on fixing the race condition in `wait_for`

I looked into the wait_for option a bit more and found out that in its current implementation it's definitely a bit cumbersome to use, and fixing it is unfortunately non-trivial.

As we discussed before, if the element already has the attribute set before we send the querySelector call, we won't get any attributeModified notification and run into a timeout. In fact, the "Example HTML" we have in the docs has this error if used as-is (i.e. without any setTimeout or JS lib doing some work in between).
We could fix this two ways:
- "polling": Repeatedly ask Chrome to DOM.getAttributes instead of waiting for attributeModified -> this should work, but feels a little clunky.
- Reading the DOM.setChildNode notifications we get right after DOM.getDocument and filter out the relevant node and its attributes. This would be my preferred approach, but turned out to be rather tricky unfortunately
  - the DOM.setChildNode messages come after DOM.getDocument, i.e. theoretically before we get the result of DOM.querySelector -> hence at this point in time we might not know the nodeId of the element in question. We could instead match the node based on the id attribute here.
  - DOM.setChildNode notifications traverse the tree, so technically we will get between one and many of them, and would need to read them all. That's another thing that the Protocol machine can't do right now.

In summary: Either way outlined above, in combination with some kind of if_state runtime conditional, is a whole lot of complexity we would introduce only to make sure the wait_for logic is race-free, i.e. can deal with elements which have the attribute already set. Therefore, even if I like the API interface quite a lot (as it covers a lot and doesn't need help from the client-side), IMO it is not worth the added complexity in the code. So, at this point I discarded the entire idea and looked for alternatives.

Evaluating a custom script instead

Actually I had planned to support this devtools call even before you added wait_for, but didn't move on with it as I personally had no use for it at the time: Runtime.evaluate. It might be a bit of a sledgehammer solution to replace wait_for, but of course it has potential to solve other use-cases as well. Looking into it this morning, I wanted to specifically see what it takes to mimick wait_for though, and here it is:

# protocols/print_to_pdf.ex
if_option :evaluate do
  call(:evaluate, "Runtime.evaluate", [{"expression", [:evaluate, :expression]}], %{awaitPromise: true})
  await_response(:evaluated, [])
end

# client call
@wait_for_js """
async function waitForAttribute(selector, attribute) {
  while (!document.getElementById(selector).hasAttribute(attribute) {
    await new Promise(resolve => requestAnimationFrame(resolve));
  }
};

waitForAttribute('testdiv', 'ready-to-print');
"""

ChromicPDF.print_to_pdf(..., evaluate: %{expression: @wait_for_js})

Of course I didn't come up with this script myself: stackoverflow link. In my test setup, this worked for both cases; when I had the attribute already set on the element as well as with setTimeout(... setAttribute..., 1000).

In my opinion, this is quite neat. What do you think? Would you be ok with replacing the wait_for option with this instead?One could theoretically also emulate the wait_for behaviour with this script underneath, i.e. put this script into ChromicPDF and make the wait_for option set an evaluate option instead. But I'm not yet convinced maintaining this in code has much added benefit over just mentioning it in the docs.

ChromicPDF keeps crashing in Docker container

I'm trying to run ChromicPDF in a docker container.
My Docker container looks something like this:

# runtime is built first to prevent invalidating cache, as this is one of the slower stages.
FROM alpine:3.13 as runtime
# Install Chromium for PDF generation
RUN echo "http://dl-cdn.alpinelinux.org/alpine/edge/main" > /etc/apk/repositories \
    && echo "http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories \
    && echo "http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories \
    && echo "http://dl-cdn.alpinelinux.org/alpine/v3.12/main" >> /etc/apk/repositories \
    && apk upgrade -U -a \
    && apk add \
    libstdc++ \
    chromium \
    harfbuzz \
    nss \
    freetype \
    ttf-freefont \
    font-noto-emoji \
    wqy-zenhei \
    && rm -rf /var/cache/* \
    && mkdir /var/cache/apk

ENV CHROME_BIN=/usr/bin/chromium-browser \
    CHROME_PATH=/usr/lib/chromium/

FROM elixir:1.11-alpine as build
ENV MIX_ENV=prod
WORKDIR /tmp/app
RUN mix local.hex --force && mix local.rebar --force

COPY mix.exs mix.lock ./
RUN mix do deps.get --only prod

FROM node:15-alpine as frontend
WORKDIR /tmp/app_front

COPY --from=build /tmp/app/deps/phoenix/ ./deps/phoenix/
COPY --from=build /tmp/app/deps/phoenix_html/ ./deps/phoenix_html/
COPY --from=build /tmp/app/deps/phoenix_live_view/ ./deps/phoenix_live_view/

COPY assets/ ./assets/
RUN npm ci --progress=false --no-audit --loglevel=error --prefix assets
ENV NODE_ENV=production
RUN npm run deploy --prefix assets

FROM build as release
COPY config ./config
COPY lib ./lib
COPY priv ./priv
COPY --from=frontend /tmp/app_front/priv/static ./priv/static/

RUN mix do phx.digest, compile, release

FROM runtime as app
WORKDIR /app
RUN chown -R nobody:nobody /app
RUN apk add --no-cache openssl ncurses-libs

USER nobody:nobody
COPY --from=release --chown=nobody:nobody /tmp/app/_build/prod/rel/app ./
ENV HOME=/app
CMD ["bin/app", "start"]

I simply added ChromicPDF to my mix dependencies, and in the supervision tree without additional parameters.

However, when running the application I get the following in the output (keeps spamming until the supervisor gives up and terminates the entire runtime):

16:37:26.489 [error] GenServer #PID<0.3047.0> terminating
app_1  | ** (stop) :connection_terminated
app_1  | Last message: {:EXIT, #Port<0.12>, :normal}
app_1  | 16:37:26.489 [error] Task #PID<0.3049.0> started from #PID<0.3048.0> terminating
app_1  | ** (stop) exited in: GenServer.call(#PID<0.3046.0>, {:run_protocol, %ChromicPDF.Protocol{result_fun: nil, state: %{ignore_certificate_errors: false, offline: true}, steps: [call: &ChromicPDF.SpawnSession.create_browser_context/2, await: &ChromicPDF.SpawnSession.browser_context_created/2, call: &ChromicPDF.SpawnSession.create_target/2, await: &ChromicPDF.SpawnSession.target_created/2, call: &ChromicPDF.SpawnSession.attach/2, await: &ChromicPDF.SpawnSession.attached/2, call: &ChromicPDF.SpawnSession.set_user_agent/2, call: &ChromicPDF.SpawnSession.offline_mode/2, call: &ChromicPDF.SpawnSession.enable_page/2, await: &ChromicPDF.SpawnSession.page_enabled/2, call: &ChromicPDF.ResetTarget.reset_history/2, await: &ChromicPDF.ResetTarget.history_reset/2, call: &ChromicPDF.ResetTarget.blank/2, await: &ChromicPDF.ResetTarget.blanked/2, await: &ChromicPDF.ResetTarget.fsl_after_blank/2, output: &ChromicPDF.SpawnSession.output/1]}}, 5000)

I've tried setting debug_protocol: true in the config, but that did not reveal additional information.
The Chromium binary is definitely present and detected (System.find_executable returns the correct path).

Remove poolboy

a resource pool does not make sense for Chrome, as the shared resource is using a single multiplexed channel, so we lose the benefits or parallelizing on the client
it does make sense for the ghostscript pool, but can easily be replaced with some "task count" solution, i.e. we don't need the complexity/feature set of poolboy

Chrome process isn't cleaned up when server is stopped

When I start my app with iex -S mix or mix phx.server and kill it with an interrupt (ctrl+c), the Chrome process isn't cleaned up. In fact, three processes remain.

13039   0.0  0.1  4920020  24212   ??  S     9:41AM   0:00.06 /Applications/Google Chrome.app/Contents/Frameworks/Google Chrome Framework.framework/Versions/86.0.4240.75/Helpers/Google Chrome Helper.app/Contents/MacOS/Google Chrome Helper --type=utility --utility-sub-type=network.mojom.NetworkService --field-trial-handle=1718379636,10962548680910408603,528382615615534866,131072 --lang=en-US --service-sandbox-type=network --use-mock-keychain --use-gl=swiftshader-webgl --headless --shared-files --seatbelt-client=23  

13038   0.0  0.4  6001936 120940   ??  S     9:41AM   0:00.20 /Applications/Google Chrome.app/Contents/Frameworks/Google Chrome Framework.framework/Versions/86.0.4240.75/Helpers/Google Chrome Helper (GPU).app/Contents/MacOS/Google Chrome Helper (GPU) --type=gpu-process --field-trial-handle=1718379636,10962548680910408603,528382615615534866,131072 --headless --headless --gpu-preferences=MAAAAAAAAAAgAAAAAAAAAAAAAAAAAAAAAABgAAAAAAAQAAAAAAAAAAAAAAAAAAAA6AAAABwAAADgAAAAAAAAAOgAAAAAAAAA8AAAAAAAAAD4AAAAAAAAAAABAAAAAAAACAEAAAAAAAAQAQAAAAAAABgBAAAAAAAAIAEAAAAAAAAoAQAAAAAAADABAAAAAAAAOAEAAAAAAABAAQAAAAAAAEgBAAAAAAAAUAEAAAAAAABYAQAAAAAAAGABAAAAAAAAaAEAAAAAAABwAQAAAAAAAHgBAAAAAAAAgAEAAAAAAACIAQAAAAAAAJABAAAAAAAAmAEAAAAAAACgAQAAAAAAAKgBAAAAAAAAsAEAAAAAAAC4AQAAAAAAABAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAGAAAAEAAAAAAAAAAAAAAABwAAABAAAAAAAAAAAAAAAAgAAAAQAAAAAAAAAAAAAAAKAAAAEAAAAAAAAAAAAAAACwAAABAAAAAAAAAAAAAAAA0AAAAQAAAAAAAAAAEAAAAAAAAAEAAAAAAAAAABAAAABgAAABAAAAAAAAAAAQAAAAcAAAAQAAAAAAAAAAEAAAAIAAAAEAAAAAAAAAABAAAACgAAABAAAAAAAAAAAQAAAAsAAAAQAAAAAAAAAAEAAAANAAAAEAAAAAAAAAAEAAAAAAAAABAAAAAAAAAABAAAAAYAAAAQAAAAAAAAAAQAAAAHAAAAEAAAAAAAAAAEAAAACAAAABAAAAAAAAAABAAAAAoAAAAQAAAAAAAAAAQAAAALAAAAEAAAAAAAAAAEAAAADQAAABAAAAAAAAAABgAAAAAAAAAQAAAAAAAAAAYAAAAGAAAAEAAAAAAAAAAGAAAABwAAABAAAAAAAAAABgAAAAgAAAAQAAAAAAAAAAYAAAAKAAAAEAAAAAAAAAAGAAAACwAAABAAAAAAAAAABgAAAA0AAAA= --use-gl=swiftshader-webgl --shared-files            13035   0.0  0.1  
5461308  41252   ??  Ss    9:41AM   0:00.23 /Applications/Google Chrome.app/Contents/MacOS/Google Chrome --headless --disable-gpu --remote-debugging-pipe

@andreasknoepfle confirmed this again and will try to fix it soon ☝️ unless you're eager to dig into it... It's not noticable usually until you reach the point where you have 500 chrome instances running and your OS starts doing funky things.

function :telemetry.span/3 is undefined or private

Hi, how to solve this issue? "function :telemetry.span/3 is undefined or private"
tried doing it in postman, tia

Timeout error on shutting down

Running a deploy on fly.io with zero activity and no pdfs being generated. I'm unsure if this is from the pool shutting down or starting up.

	 00:19:00.549 [error] Task #PID<0.510.0> started from #PID<0.509.0> terminating
	 ** (RuntimeError) Timeout in Channel.run_protocol/3!
	 The underlying GenServer.call/3 exited with a timeout. This happens when the browser was
	 not able to complete the current operation (= PDF print job) within the configured
	 5000 milliseconds.
	 If you are printing large PDFs and expect long processing times, please consult the
	 documentation for the `timeout` option of the session pool.
	 If you are *not* printing large PDFs but your print jobs still time out, this is likely a
	 bug in ChromicPDF. Please open an issue on the issue tracker.
	     (chromic_pdf 1.3.0) lib/chromic_pdf/pdf/browser/channel.ex:26: ChromicPDF.Browser.Channel.run_protocol/3
	     (chromic_pdf 1.3.0) lib/chromic_pdf/pdf/browser/session_pool.ex:122: ChromicPDF.Browser.SessionPool.do_init_worker/3
	     (elixir 1.13.4) lib/task/supervised.ex:89: Task.Supervised.invoke_mfa/2
	     (elixir 1.13.4) lib/task/supervised.ex:34: Task.Supervised.reply/4
	     (stdlib 4.0.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
	 Function: #Function<0.72123262/0 in ChromicPDF.Browser.SessionPool.init_worker/1>
	     Args: [

	if_option {:source_type, :html} do
	call(:get_frame_tree, "Page.getFrameTree", [], %{})
	await_response(:frame_tree, [{["frameTree", "frame", "id"], "frameId"}])
	call(:set_content, "Page.setDocumentContent", [:html, "frameId"], %{})
	await_response(:content_set, [])
	end

	if_option {:source_type, :url} do
	call(:navigate, "Page.navigate", [:url], %{})

	await_response(:navigated, ["frameId"]) do
	case get_in(msg, ["result", "errorText"]) do
	nil ->
	:ok

	error ->
	{:error, error}
	end
	end

	await_notification(:frame_stopped_loading, "Page.frameStoppedLoading", ["frameId"], [])
	end

bitcrowd / chromic_pdf Goto Github PK

chromic_pdf's Issues

Problem

Diagnose

How to reproduce

Expected behaviour

on fixing the race condition in wait_for

Evaluating a custom script instead

Recommend Projects

Recommend Topics

Recommend Org

on fixing the race condition in `wait_for`