Coder Social home page Coder Social logo

jean-humann / docs-to-pdf Goto Github PK

View Code? Open in Web Editor NEW
93.0 0.0 17.0 21.86 MB

Generate PDF for document website ๐Ÿง‘โ€๐Ÿ”ง

Home Page: https://www.npmjs.com/package/docs-to-pdf

License: MIT License

JavaScript 0.94% TypeScript 10.89% Shell 0.07% Dockerfile 0.22% MDX 0.58% HTML 87.04% CSS 0.25%
documentation docusaurus docusaurus-documentation pdf-generation pdf pdf-converter

docs-to-pdf's Introduction

Hey, I'm Jean Human! ๐Ÿ‘‹

๐Ÿ‘จโ€๐Ÿ’ป About Me

I'm the Technical Director at Cleyrop, where we're on a mission to create an end-to-end data platform that prioritizes security and sovereignty. With a background in Machine Learning and a passion for technology, I'm excited about exploring new frontiers in data, containers, and transformers.

๐ŸŒŸ Professional Goals

  • ๐Ÿš€ Lead a talented team to build innovative data solutions that make an impact.
  • ๐ŸŒ Create a secure and sovereign data platform that empowers businesses to harness the full potential of their data.
  • ๐Ÿง  Foster a culture of continuous learning and exploration, keeping up with the latest tech trends.

๐ŸŒฑ Personal Goals

  • ๐Ÿ“š Dive deeper into cutting-edge Natural Language Processing (NLP) techniques and models.
  • ๐Ÿƒโ€โ™‚๏ธ Balance work with my passion for outdoor activities like biking and running.
  • ๐ŸŽ Explore the intersection of technology and the Apple ecosystem, finding creative ways to integrate the two.

๐Ÿค Let's Connect

  • ๐Ÿ’ฌ I'm always open to engaging discussions about data, tech, and everything in between.
  • ๐Ÿ“ซ You can reach me at [email protected].
  • ๐Ÿฆ Connect with me on Twitter.
  • ๐Ÿ’ผ Let's connect on LinkedIn

docs-to-pdf's People

Contributors

codingluke avatar dependabot[bot] avatar jafin avatar jean-humann avatar kohheepeace avatar ksmarty avatar lidkxx avatar meddbase-steve avatar mrdrivingduck avatar release-please[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docs-to-pdf's Issues

Error on generating - timeout

I am trying to generate PDF from

npx docs-to-pdf --initialDocURLs="https://ignatandrei.github.io/RSCG_Examples/v2/docs/List-of-RSCG" --contentSelector="article" --paginationSelector="a.pagination-nav__link.pagination-nav__link--next" --excludeSelectors=".margin-vert--xl a,[class^='tocCollapsible'],.breadcrumbs,.theme-edit-this-page"  --coverTitle="RSCG --protocolTimeout=54000"

It is all well before the final
[30.08.2023 23:15.27.852] [LOG] Start generating PDF...
[30.08.2023 23:15.27.852] [LOG] Generate cover...
[30.08.2023 23:15.27.852] [LOG] Start generating TOC...
[30.08.2023 23:15.27.958] [LOG] Restructuring the html of a document...
[30.08.2023 23:15.35.378] [LOG] Remove unnecessary HTML...
[30.08.2023 23:15.35.379] [LOG] Scroll to the bottom of the page...
[30.08.2023 23:16.29.393] [ERROR] ProtocolError: Runtime.callFunctionOn timed out. Increase the 'protocolTimeout' setting in launch/connect calls for a higher timeout if needed.
at <instance_members_initializer> (C:\Users\ignat\AppData\Local\npm-cache_npx\c16ac64a6c7aba73\node_modules\puppeteer-core\lib\cjs\puppeteer\common\Connection.js:49:14)
at new Callback (C:\Users\ignat\AppData\Local\npm-cache_npx\c16ac64a6c7aba73\node_modules\puppeteer-core\lib\cjs\puppeteer\common\Connection.js:53:16)
at CallbackRegistry.create (C:\Users\ignat\AppData\Local\npm-cache_npx\c16ac64a6c7aba73\node_modules\puppeteer-core\lib\cjs\puppeteer\common\Connection.js:93:26)

Could you please help?

Idea: Align headers level to the sidebar nesting, or make page level configurable by meta keywords

At the moment, when generating a PDF from a Website, every subpage starts with a <h1>. However on the Website some pages are nested under higher level pages.

For example:

Screenshot_2023-08-16_000095

Here getting started is the entry point and has multiple subsites like "installation" and "configuration" and so on.

I question myself whether it would be great to finde out, if a page is a parent or a child and automatically change the heading level to the next, when it is a child. On installation the <h1> would become a <h2> and so on...

๐Ÿ’ก We could also manage this with meta keywords, so it would be manual configurable per page :)
Together with the bookmarks enhancement this would make it superior to word and google docs.

What do you think?

Quick Start example doesn't work

I tried running the example from the README

npx docs-to-pdf --initialDocURLs="https://docusaurus.io/docs/" --contentSelector="article" --paginationSelector="a.pagination-nav__link.pagination-nav__link--next" --excludeSelectors=".margin-vert--xl a,[class^='tocCollapsible'],.breadcrumbs,.theme-edit-this-page" --coverImage="https://docusaurus.io/img/docusaurus.png" --coverTitle="Docusaurus v2"

and I got this error:

[10.10.2023 11:08.19.379] [DEBUG] Using Chromium from /home/kkovacs/.cache/puppeteer/chrome/linux-117.0.5938.149/chrome-linux64/chrome
[10.10.2023 11:08.19.607] [DEBUG] Chrome user data dir: /tmp/puppeteer_dev_chrome_profile-2V52e1
[10.10.2023 11:08.19.646] [LOG]   Retrieving html from https://docusaurus.io/docs/
[10.10.2023 11:08.21.047] [DEBUG] Found 0 elements
[10.10.2023 11:08.21.049] [LOG]   Success
[10.10.2023 11:08.21.051] [LOG]   Retrieving html from https://docusaurus.io/docs/category/getting-started
[10.10.2023 11:08.22.165] [DEBUG] Found 0 elements
[10.10.2023 11:08.22.166] [LOG]   Success


...


[10.10.2023 11:09.23.630] [LOG]   Success
[10.10.2023 11:09.23.634] [LOG]   Retrieving html from https://docusaurus.io/docs/deployment
[10.10.2023 11:09.25.372] [DEBUG] Found 6 elements
[10.10.2023 11:09.25.379] [DEBUG] Clicking summary: How much resource (person-hours, money) am I willing to invest in this?
[10.10.2023 11:09.26.267] [DEBUG] Clicking summary: How much server-side configuration would I need?
[10.10.2023 11:09.27.104] [DEBUG] Clicking summary: Do I have needs to cooperate?
[10.10.2023 11:09.27.944] [DEBUG] Clicking summary: GitHub action files
[10.10.2023 11:09.28.771] [DEBUG] Clicking summary: GitHub action file
[10.10.2023 11:09.28.780] [ERROR] Error: Node is either not clickable or not an Element
    at CdpElementHandle.clickablePoint (/home/kkovacs/.npm/_npx/c16ac64a6c7aba73/node_modules/puppeteer-core/lib/cjs/puppeteer/api/ElementHandle.js:680:23)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async CdpElementHandle.<anonymous> (/home/kkovacs/.npm/_npx/c16ac64a6c7aba73/node_modules/puppeteer-core/lib/cjs/puppeteer/api/ElementHandle.js:258:32)
    at async CdpElementHandle.click (/home/kkovacs/.npm/_npx/c16ac64a6c7aba73/node_modules/puppeteer-core/lib/cjs/puppeteer/api/ElementHandle.js:710:30)
    at async CdpElementHandle.<anonymous> (/home/kkovacs/.npm/_npx/c16ac64a6c7aba73/node_modules/puppeteer-core/lib/cjs/puppeteer/api/ElementHandle.js:261:36)
    at async openDetails (/home/kkovacs/.npm/_npx/c16ac64a6c7aba73/node_modules/docs-to-pdf/lib/utils.js:212:13)
    at async generatePDF (/home/kkovacs/.npm/_npx/c16ac64a6c7aba73/node_modules/docs-to-pdf/lib/utils.js:82:21)

Just wanted to point this out because I'm struggling to get this to work on my own site, so I wanted a working example reference.

Error: Node is either not clickable or not an Element when <details> is inside <tabs>

Hello!

I have a page with <tabs>, one of which contains <details>.

Last logs before the error:

[LOG]   Retrieving html from <page url>
[DEBUG] Found 1 elements
[DEBUG] Clicking summary: <element name>

and then the error:

Error: Node is either not clickable or not an Element
    at CdpElementHandle.clickablePoint (C:\Users\user\AppData\Roaming\npm\node_modules\docs-to-pdf\node_modules\puppeteer-core\lib\cjs\puppeteer\api\ElementHandle.js:682:23)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async CdpElementHandle.<anonymous> (C:\Users\user\AppData\Roaming\npm\node_modules\docs-to-pdf\node_modules\puppeteer-core\lib\cjs\puppeteer\api\ElementHandle.js:259:32)
    at async CdpElementHandle.click (C:\Users\user\AppData\Roaming\npm\node_modules\docs-to-pdf\node_modules\puppeteer-core\lib\cjs\puppeteer\api\ElementHandle.js:712:30)
    at async CdpElementHandle.<anonymous> (C:\Users\user\AppData\Roaming\npm\node_modules\docs-to-pdf\node_modules\puppeteer-core\lib\cjs\puppeteer\api\ElementHandle.js:262:36)
    at async openDetails (C:\Users\user\AppData\Roaming\npm\node_modules\docs-to-pdf\lib\utils.js:212:13)
    at async generatePDF (C:\Users\user\AppData\Roaming\npm\node_modules\docs-to-pdf\lib\utils.js:82:21)

image

Basic Auth support

Hi Jean,

thanks for creating this project.
It works great for me.

The production version of my documentation is behind a basic auth access.
Would it be possible add the credentials at startup of the crawler?

Kind regards

bookmarks

Can I support generating PDF bookmarks?

Option to restrict the subpath range

npx docs-to-pdf --initialDocURLs="https://docusaurus.io/docs/markdown-features" --contentSele
ctor="article" --paginationSelector="a.pagination-nav__link.pagination-nav__link--next" --excludeSelectors=".margin-vert--xl a,[class^='tocCollapsible'],.breadcrumbs,.theme-edit-this-page" --coverImage="https://docusaurus.io/img/docusaurus.png" --coverTitle="Docusaurus v2"
[13.08.2023 17:17.08.551] [DEBUG] Using Chromium from C:\Program Files\Google\Chrome\Application\chrome.exe
[13.08.2023 17:17.08.781] [DEBUG] Chrome user data dir: C:\Users\tatsu\AppData\Local\Temp\puppeteer_dev_chrome_profile-wjQgPd
[13.08.2023 17:17.08.870] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features
[13.08.2023 17:17.10.684] [LOG]   Success
[13.08.2023 17:17.10.689] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/react
[13.08.2023 17:17.12.843] [LOG]   Success
[13.08.2023 17:17.12.844] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/tabs
[13.08.2023 17:17.14.508] [LOG]   Success
[13.08.2023 17:17.14.510] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/code-blocks
[13.08.2023 17:17.16.113] [LOG]   Success
[13.08.2023 17:17.16.114] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/admonitions
[13.08.2023 17:17.17.707] [LOG]   Success
[13.08.2023 17:17.17.711] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/toc
[13.08.2023 17:17.19.122] [LOG]   Success
[13.08.2023 17:17.19.127] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/assets
[13.08.2023 17:17.21.602] [LOG]   Success
[13.08.2023 17:17.21.603] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/links
[13.08.2023 17:17.23.143] [LOG]   Success
[13.08.2023 17:17.23.144] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/plugins
[13.08.2023 17:17.24.639] [LOG]   Success
[13.08.2023 17:17.24.641] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/math-equations
[13.08.2023 17:17.26.649] [LOG]   Success
[13.08.2023 17:17.26.650] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/diagrams
[13.08.2023 17:17.28.193] [LOG]   Success
[13.08.2023 17:17.28.194] [LOG]   Retrieving html from https://docusaurus.io/docs/markdown-features/head-metadata
[13.08.2023 17:17.29.655] [LOG]   Success
[13.08.2023 17:17.29.658] [LOG]   Retrieving html from https://docusaurus.io/docs/styling-layout
[13.08.2023 17:17.30.985] [LOG]   Success
[13.08.2023 17:17.30.987] [LOG]   Retrieving html from https://docusaurus.io/docs/swizzling
[13.08.2023 17:17.32.235] [LOG]   Success
๏ธ™

Is there an option to prevent this software from fetching pages out of https://docusaurus.io/docs/markdown-features?
It can't be covered by --excludeURLs.

An option to control whether all of `<details>` elements are opened

https://docusaurus.io/docs/markdown-features#details

<details> allows us to hide contents only for experts. It would be nice if we can control whether <details> are opened.

In the current version, all of <details> are closed.

For beginners
image

For experts
image

Can Puppeteer do this operation before printing the jointed page?

flowchart TD

S(Start) --> F[Find and open closed elements]
F --> C{New closed\nelements appeared?}
C -->|Yes| F
C -->|No| Done(Done)
Loading

How to disabled cover and TOC title

Without coverTitle coverImage coverSub options, a blank cover is still generated.
TOC title Table of contents: cannot be modified or disabled.

Search / Select in Mac Preview not working

Hi @jean-humann

I just figured out something very strange. When I open the generated PDF in my firefox, I can select and search text just fine. However, when I open the same File in Mac Preview the text is not correctly selectable.

Here a video showing it with the example pdf.

Screenshot_2023-08-10_000075.mp4

When I try the same with the PDFs generated by marp which also uses pupperteer/chromium to generate PDFs from HTML, everything works fine. @yhatt do you maybe have some idea on this?

Best codingluke

Templates for arguments

--contentSelector="article" --paginationSelector="a.pagination-nav__link.pagination-nav__link--next" --excludeSelectors=".margin-vert--xl a,[class^='tocCollapsible'],.breadcrumbs,.theme-edit-this-page"

This software always requires a so-long options. It is so long that no one can input without reading the README. It would be nice if we can shorten this to like:

--template docusaurus2

Hyperlinks in PDF linking to web documentation

The links (apart from TOC) inside the PDF open up the corresponding web page instead of the PDF page. Is there a way to ensure the links point to the heading in the PDF instead of the web page?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.