pandoc / dockerfiles Goto Github PK
View Code? Open in Web Editor NEWDockerfiles for various pandoc images
License: GNU General Public License v2.0
Dockerfiles for various pandoc images
License: GNU General Public License v2.0
I'm trying to change the mainfont
in the frontmatter of a Markdown file in the pandoc/latex:2.7.3 image using XeLaTeX. But whatever I specify I get following error:
kpathsea: Running mktextfm Euler
/opt/texlive/texdir/texmf-dist/web2c/mktexnam: Could not map source abbreviation for Euler.
/opt/texlive/texdir/texmf-dist/web2c/mktexnam: Need to update ?
I already ran tlmgr update --self && tlmgr update --all
before โฆ
Edit from future: see #78 (comment) for reversion deadlines / strategy.
The smarter entrypoint script introduced in 51cb2b6 closing #54 may add some unwelcome complexity.
For example (maybe the only case), @zspitz noticed that the entrypoint script won't let pandoc containers accept pandoc arguments and input files in arbitrary orders (see pandoc/pandoc-action-example#3 and jgm/pandoc-website#38).
@tarleb suggested to improve the auto detection, but I'm wondering whether it would not just make more sense to revert to something simple:
ENTRYPOINT ["pandoc"]
CMD ["--help"]
I know a lot of thought went into #54 so feel free to close this.
My expectation was actually that there'd be no such cleverness, and it surprised me that this could be a source of problems; much like @zspitz I would've never guessed it.
In #54, @tarleb mentioned that:
New users are repeatedly irritated by our use of ENTRYPOINT.
But isn't ENTRYPOINT ["pandoc"]
the default for containers?
And users can always override with docker run --entrypoint=/bin/bash pandoc ls
, right?
I'm also not sure the helper script in the dockerfile docs apply here:
The
ENTRYPOINT
instruction can also be used in combination with a helper script, allowing it to function in a similar way to the command above, even when starting the tool may require more than one step.
(emphasis added).
But this (more than one step) isn't the case for pandoc
, right?
Support for
I see three good ways to support these:
Each has its own pros and cons.
Based on the mightymake project, I've been evolving a bit a practical project setup arround make and pandoc/dockerfiles. Ideally, I'd like to contribute it to this github org, but maybe it should receive some additional review / stripdown / polishing before seriously pursuing this idea.
Nevertheless, I want to share here informally:
# ./Makefile
.DEFAULT_GOAL := help
define INFORMATION
Pulishing pipeline using pandoc.
Version 1.0
USAGE:
make <folder>/<file>.<ext> generate the requested file
make all.html generate html version for all sources
make all.pdf generate PDF version for all sources
make all.epub generate epub version for all sources
make all generate all files in all versions
make clean erases the output folder
make build builds the custom pandoc image for this project
make new.<XYZ>/<docname> quick create XYZ markdown document from template
endef
export INFORMATION
SHELL = /bin/bash
PANDOCIMAGE=pandoc/pharmaone:latest
DATADIR=.data
OUTPUTDIR=.out
TEMPLATENAME=0000-plantilla
DEFAULTFILE=defaults.yaml
TEMPLATES=$(wildcard $(DATADIR)/templates/$(@D).*)
LETTERHEAD=$(wildcard $(DATADIR)/letterheads/$(@D)/letterhead.png)
LEGALHEAD=$(wildcard $(DATADIR)/letterheads/$(@D)/legalhead.png)
METADATAFILE=$(wildcard $(DATADIR)/metadata/$(@D).yaml)
PANDOCCMD=docker run --rm --volume "`pwd`:/data" --user `id -u`:`id -g`
PANDOCCMD+= $(PANDOCIMAGE)
PANDOCCMD+= --defaults=$(DEFAULTFILE)
PANDOCCMD+= --data-dir=$(DATADIR)
PANDOCCMD+= --resource-path=$(@D)
# PANDOCCMD+= --extract-media=./$(OUTPUTDIR)/$*
PANDOCCMD+= --output=./$(OUTPUTDIR)/$@
PANDOCCMD+= $(if $(LETTERHEAD), --variable=letterhead:$(LETTERHEAD))
PANDOCCMD+= $(if $(LEGALHEAD), --variable=legalhead:$(LEGALHEAD))
PANDOCCMD+= $(if $(METADATAFILE), --metadata-file=$(METADATAFILE))
PANDOCCMD+= $<
CUSTOMLATEXTEMPLATE=$(if $(findstring .latex,$(TEMPLATES)),--template=$(@D))
CUSTOMHTMLTEMPLATE=$(if $(findstring .html,$(TEMPLATES)),--template=$(@D))
SOURCES = $(sort $(wildcard */*.md) $(wildcard */**/*.md))
NEWDOC_PROTO_TARGETS = $(foreach i,$(sort $(dir $(SOURCES))),new.$(i))
PDF = $(SOURCES:.md=.pdf)
TEX = $(SOURCES:.md=.tex)
HTML = $(SOURCES:.md=.html)
EPUB = $(SOURCES:.md=.epub)
PACKAGEREGEX=(?<=^% install{)[\w-_]+(?=}$$)
all : html epub pdf tex
all.tex: $(TEX)
all.pdf: $(PDF)
all.html: $(HTML)
all.epub: $(EPUB)
$(TEX) : %.tex : %.md
@mkdir -p ./$(OUTPUTDIR)/$(@D)
$(PANDOCCMD) $(CUSTOMLATEXTEMPLATE) --to=latex
$(PDF) : %.pdf : %.md
@mkdir -p ./$(OUTPUTDIR)/$(@D)
$(PANDOCCMD) $(CUSTOMLATEXTEMPLATE) --to=pdf
$(HTML) : %.html : %.md
@mkdir -p ./$(OUTPUTDIR)/$(@D)
$(PANDOCCMD) $(CUSTOMHTMLTEMPLATE) --to=html
$(EPUB) : %.epub : %.md
@mkdir -p ./$(OUTPUTDIR)/$(@D)
$(PANDOCCMD) $(@:.epub=.md) --to=epub
.PHONY: help build clean $(NEWDOC_TARGETS)
help:; @echo "$$INFORMATION"
build:; docker build --build-arg "packages=$$(cat $(DATADIR)/templates/*.latex | grep -Po "$(PACKAGEREGEX)" | paste -sd ' ' -)" -t $(PANDOCIMAGE) .
clean:; rm -rf ./$(OUTPUTDIR)
$(NEWDOC_PROTO_TARGETS): # just to make stem autocompletion work
NEWDOC_TARGETS = $(foreach i,$(NEWDOC_PROTO_TARGETS),$(i)%)
NEWDIR=$(subst new.,,$(dir $@))
NEWFOLDER=$(NEWDIR)$*
NEWMD=$(NEWFOLDER).md
TMPLFOLDER=$(NEWDIR)$(TEMPLATENAME)
TMPLMD=$(TMPLFOLDER).md
$(NEWDOC_TARGETS):
@echo "Bootstrap $(NEWMD) from $(TMPLMD) ..."
@if [ ! -d "$(TMPLFOLDER)" ]; then echo "$(TMPLFOLDER) template folder doesn't exist..."; fi
@if [ ! -f "$(TMPLMD)" ]; then echo "$(TMPLMD) template doesn't exist..."; fi
@if [ -d "$(NEWFOLDER)" ]; then echo "$(NEWFOLDER) folder alread exists..."; elif [ -d "$(TMPLFOLDER)" ]; then cp -r $(TMPLFOLDER) $(NEWFOLDER); fi
@if [ -f "$(NEWMD)" ]; then echo "$(NEWMD) alread exists..."; elif [ -f "$(TMPLMD)" ]; then cp $(TMPLMD) $(NEWMD); fi
# Makefile extensions
# ./Dockerfile
# NOTE: drop when merged https://github.com/pandoc/dockerfiles/pull/50
FROM pandoc/latex:2.9.2.1
RUN wget -O- https://github.com/lierdakil/pandoc-crossref/releases/download/v0.3.6.2a/pandoc-crossref-Linux-2.9.2.1.tar.xz | tar -xJ -C/usr/bin/ pandoc-crossref
ARG packages=
RUN tlmgr update --self
RUN [ -z "$packages" ] && echo "No packages to install" || tlmgr install ${packages}
# ./defaults.yaml
from: markdown+simple_tables+table_captions+yaml_metadata_block+smart
# reader: may be used instead of from:
# to: pdf
# writer: may be used instead of to:
# leave blank for output to stdout:
output-file:
# leave blank for input from stdin, use [] for no input:
input-files:
# or you may use input-file: with a single value
template: default
standalone: true
self-contained: false
# note that structured variables may be specified:
variables:
lof: true
lot: true
graphics: true
geometry:
- scale=0.6
- ignorefoot=true
- footskip=25mm
# - top=30mm
# - left=20mm
- centering
- vcentering
pagestyle: fancy
textcolor: black!25!darkgray
# mainfont: DejaVuSerif.ttf
mainfont: DejaVuSans.ttf
sansfont: DejaVuSans.ttf
monofont: DejaVuSansMono.ttf
colorlinks: true
# letterhead: ".data/letterheads/default.png"
# legalhead: ".data/letterheads/default.png"
# documentclass: book
# classoption:
# - twosides
# - draft
# metadata values specified here are parsed as literal
# string text, not markdown:
metadata:
lang: es-CO
# metadata-files:
# - .data/metadata/default.yaml
# or you may use metadata-file: with a single value
# Note that these take files, not their contents:
include-before-body: []
include-after-body: []
include-in-header: []
resource-path: ["."]
# filters will be assumed to be Lua filters if they have
# the .lua extension, and json filters otherwise. But
# the filter type can also be specified explicitly, as shown:
filters:
- pandoc-crossref
- pandoc-citeproc
- abstract-to-meta.lua
file-scope: true
# ERROR, WARNING, or INFO
verbosity: INFO
# log-file: log.json
# citeproc, natbib, or biblatex
cite-method: citeproc
# part, chapter, section, or default:
top-level-division: default
abbreviations:
pdf-engine: xelatex
# pdf-engine-opts:
# - "-shell-escape"
# you may also use pdf-engine-opt: with a single option
# pdf-engine-opt: "-shell-escape"
# auto, preserve, or none
wrap: auto
columns: 78
dpi: 72
table-of-contents: true
toc-depth: 3
number-sections: false
# a list of offsets at each heading level
number-offset: [0,0,0,0,0,0]
# toc: may also be used instead of table-of-contents:
shift-heading-level-by: 1
section-divs: true
identifier-prefix: foo
title-prefix: ""
strip-empty-paragraphs: true
# lf, crlf, or native
eol: lf
strip-comments: false
indented-code-classes: []
ascii: true
default-image-extension: ".png"
# either a style name of a style definition file:
highlight-style: pygments
# syntax-definitions:
# or you may use syntax-definition: with a single value
listings: false
# method is plain, webtex, gladtex, mathml, mathjax, katex
# you may specify a url with webtex, mathjax, katex
html-math-method:
method: mathjax
url: "https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"
# none, references, or javascript
email-obfuscation: javascript
tab-stop: 8
preserve-tabs: true
incremental: false
slide-level: 2
reference-links: true
# block, section, or document
reference-location: block
atx-headers: false
# accept, reject, or all
track-changes: accept
html-q-tags: false
# css:
# - site.css
# none, all, or best
ipynb-output: best
# A list of two-element lists
request-headers:
- ["User-Agent", "Mozilla/5.0"]
fail-if-warnings: false
dump-args: false
ignore-args: false
trace: false
# ./.gitignore
.out
.local # If you want a place to keep local documents away from your collegues' eyes
# ./.data folder structure
./filters
./letterheads
./metadata
./my-doc-type.yaml # merged metadata for all docs in a toplevel subfolder 'my-doc-type'
./templates
./default.latex # including background letterheads over standard template
in order to hint to make which latex packages to install, I just use a specially formatted comment like so:
% install{dejavu}
% install{dejavu-otf}
-> make build
Dockers images have been built with an outdated cache. Tex Live is still in 2019 version
You need to rebuilt image with --no-cache in docker command.
Tex Live is in 2019 version on latest/edge images and it's colliding with CTAN repo
โถ docker run --rm pandoc/latex:latest tlmgr --version
tlmgr revision 53842 (2020-02-19 08:28:40 +0100)
tlmgr using installation: /opt/texlive/texdir
TeX Live (http://tug.org/texlive) version 2019
I have rebuilt the pandoc/latex image in local and the version of Tex Live 2020.
โถ docker run --rm pandoc/latex:edge tlmgr --version
tlmgr revision 54446 (2020-03-21 17:45:22 +0100)
tlmgr using installation: /opt/texlive/texdir
TeX Live (https://tug.org/texlive) version 2020
If you try to update tlmgr
FROM pandoc/latex:latest
RUN tlmgr option repository ctan && \
tlmgr update --self && \
tlmgr update --all
You'll end up with this error :
tlmgr: Local TeX Live (2019) is older than remote repository (2020).
Cross release updates are only supported with
update-tlmgr-latest(.sh/.exe) --update
See https://tug.org/texlive/upgrade.html for details.
I'm building my own pandoc image based on pandoc/latex:2.7.2
. Since a few days, tlmgr
installations fail for me:
2.7.2: Pulling from pandoc/latex
bdf0201b3a05: Pull complete
...
a0d8d5c5eda2: Pull complete
Digest: sha256:f2c2300b5af6da69bec4dcb9f54c6a68084b4e30fbc12fb2bb1a9f706308a805
Status: Downloaded newer image for pandoc/latex:2.7.2
---> 40fade551d61
Step 2/2 : RUN tlmgr install adjustbox collectbox pagecolor mdframed needspace sourcesanspro ly1 mweights sourcecodepro titling framed
---> Running in d01d50d252ba
tlmgr: Remote repository is newer than local (2018 < 2019)
Cross release updates are only supported with
update-tlmgr-latest(.sh/.exe) --update
Please see https://tug.org/texlive/upgrade.html for details.
I can't find update-tlmgr-latest(.sh/.exe)
--- any hints on how to make tlmgr work again?
I am getting this error:
pandoc: sh: openBinaryFile: does not exist (No such file or directory)
when trying to build some assets with Pandoc in a Gitlab CI bash script.
I have a repo, Finnito/Science, that is serving a Gitlab Pages site using Hugo. I am trying to set up a Gitlab CI pipeline to build my HTML slides and PDFs docs from my Markdown source when I commit to the repo so that I don't have to build them locally.
I have been trying out different Docker images of pandoc but decided pandoc/latex is my best bet because it's official and built on Alpine which is nice and lightweight. But I can't seem to make heads or tails of this error.
I have tried various different incantations for pandoc but they don't seem to work.
My Gitlab CI job looks like this:
assets:
image: pandoc/latex
script:
- chmod +x ci-build.sh
- sh ci-build.sh
and my ci-build.sh
script looks like this:
#!/bin/sh
modulesToBuild=(
"/builds/Finnito/science/content/10sci/5-fire-and-fuels"
"/builds/Finnito/science/content/10scie/6-geology"
"/builds/Finnito/science/content/11sci/4-mechanics"
"/builds/Finnito/science/content/11sci/5-genetics"
"/builds/Finnito/science/content/12phy/2-mechanics"
"/builds/Finnito/science/content/12phy/3-electricity"
)
for i in "${modulesToBuild[@]}"; do
# Navigate to the directory.
cd $i
# Build the HTML slides and
# PDFs for all markdown docs.
for filename in markdown/*.md; do
file=${filename##*/}
name=${file%%.*}
pandoc/latex pandoc -s --mathjax -i -t revealjs "markdown/$name.md" -o "$name.html"
pandoc/latex pandoc "markdown/$name.md" -o "$name.pdf" --pdf-engine=pdflatex
done
done
Honestly, I'm just pretty lost with how to successfully call pandoc within the Docker container. I am very new to this and it all makes very little sense!
I have managed to do this:
docker run -it pandoc/latex
docker exec -it 6d3e05c5aad3 /bin/sh
ls /usr/bin
and verify that pandoc is located there. I am also able to do pandoc --help
when in the shell like this.
I also tried to call it like this, but got the same error:
/usr/bin/pandoc -s --mathjax -i -t revealjs "markdown/$name.md" -o "$name.html"
Any help would be most appreciated!
New users are repeatedly irritated by our use of ENTRYPOINT. It might make sense to either
I'm trying to do some basic troubleshooting and I'm finding I can't run shell. I haven't had that issue with any other alpine image.
docker run pandoc/core /bin/sh
is erroring with:
UTF-8 decoding error in /bin/sh at byte offset 25 (d7).
The input must be a UTF-8 encoded text.
As an aside, the issue I'm trying to troubleshoot is how to pull the latex installation into a multibuild image. This works to pull in pandoc:
COPY --from=pandoc /usr/bin/pandoc* /usr/bin/
But this doesn't appear to pull in anything:
COPY --from=pandoc /opt/texlive/texdir* /usr/bin/
I'd like to convert a html file to pdf, using wkhtmltopdf
.
What would be the correct way to add it?
Should it be in the latex image or rather in a completely new one?
Pandoc 2.8 and later use HsYAML for YAML processing, removing the dependency on libyaml as an external library. We currently install it for all pandoc versions, but should not do so for versions >= 2.8.
:latest
if push event to master with release=*
message.
alpine-core
=> core
, etc).Ever since it has been included as pandoc tex rendering engine, it might be desirable to implement a tectonic pandoc dockerfile variant. Lukily enough, there is a quite recent alpine package.
I haven't tried tectonic, yet, but some of it's choice promise less pain down the road. I'm copying:
rust
Hi,
the dockerized pandoc is great. However one powerful feature, (lua-)filters, is difficult to get running when dockerized, since the filters have different requirements.
Especially when writing in an academic context, filters are necessary for
A perfect solution would be to only use lua-filters, because pandoc has builtin support for running these. But maybe it is not possible to do all the things with pure lua. Further, existing filters need to be rewritten, which seems like a waste of resources.
Having a look at this curated list https://github.com/jgm/pandoc/wiki/Pandoc-Filters some really useful filters require
So what would be an appropriate solution for this:
Because I really need filters, especially for cross referencing, I started to build a Docker file for case 2. I tried to get at least python running, because this addresses most filters. Installing python in the latex image its easy ... but not the filters. E.g. installing one of the pandoc-*nos filter requires a compilation step
pip install pandoc-fignos
...
running install
running build
running build_py
creating build
...
creating build/temp.linux-x86_64-3.7/psutil
gcc -Wno-unused-result -Wsign-compare -DNDEBUG
...
unable to execute 'gcc': No such file or directory
error: command 'gcc' failed with exit status 1
which fails. So gcc
would be necessary and probably more build tools
.
Any opinion on this?
JR
Continuation of #50 when time permits.
At least the presence of pandoc-crossref and placement in the PATH
should be tested.
This is a small change, but would allow to use the image with whalebrew.
There are missing files in the latest docker image, see
$ docker run -it pandoc/latex:2.10 sh
[WARNING] Could not deduce format from file extension
Defaulting to markdown
pandoc: sh: openBinaryFile: does not exist (No such file or directory)
We use pandoc/core:2.9.2.1
and rely on the docker-entrypoint.sh
to convert a .md
to a .html
file. Lately this command started to fail with command not found
and we noticed that this image was updated 3 days ago.
For now we are using pandoc/core:2.9.1.1
as it doesn't have this problem.
docker pull pandoc/core
currently yields
Using default tag: latest
Error response from daemon: manifest for pandoc/core:latest not found
docker pull pandoc/core:2.6
works.
My apologies if I'm asking a question in the wrong venue. I'm going to plead non-technical and ask is there an official Pandoc Docker build available in Docker Hub?
I've created a makefile for converting a nested directory of .md
files into a PDF, ePUB and Mobi. I'm trying to get Continuous Integration going, but my technical skills are so far out of date that I am not groking what needs to be done. I gather a Docker for the executable and then CircleCI for the cheap builds.
I see there may be an official-ish build either present or in the making. There are 566 Pandoc images in Docker Hub, but I don't know whether to trust or use any of them. I'd rather start with the official build if it's available.
TIA
Hi,
Suppose my source markup is under subdirectory, like this:
/
-- doc-folder
|--- images
|--- docs
|--- test.md
Running pandoc locally, I could of course convert it as follows:
% pwd
/doc-folder
% pandoc docs/test.md
<p>This is test.md</p>
But this doesn't work with pandoc/core container:
% docker run \
--rm \
--volume $(pwd):/data \
--user $(id -u):$(id -g) \
pandoc/core docs/test.md
/usr/local/bin/docker-entrypoint.sh: exec: line 11: docs/test.md: Permission denied
The problem seems to be related to this part of the docker-entrypoint.sh
:
[ -z "$(command -v "${1}")" ]
For some weird reason, command -v
in Alpine doesn't return empty output for subdirectories:
/data $ command -v test.md
/data $ command -v docs/test.md
docs/test.md
At least on macOS it doesn't behave like this...
Any idea why and how to workaround this...?
Hello. I was looking forward to adding pandoc into a pipeline for transforming our documentation into other formats (PDF) automatically. The Github Actions implementation seemed really interesting, but I really want to get it to work within the Azure DevOps/Azure Pipelines environment.
I read on the Microsoft docs on alpine containers that there are some requirements on making this work. Specifically supplying Node.js, some tools and a few references.
Can anyone tell me whether:
If the answer is no to all of these, I would kindly request support for Azure Pipelines in the future.
If this feature is possible, I would like to suggest a quickstart section to the docs.
Thank you for considering my request in advace.
Line 84 in 9df2bb4
Can result in e.g. 2.9.2.1 freeze file being regenerated. For current builds I just went ahead and did touch alpine/freeze/*
and same for ubuntu. Maybe just remove the dependency on the generation script?
I'm trying to use pandoc/latex:2.6
to convert markdown to PDF. I had in mind to use the eisvogel template, which requires a fairly complete latex distribution (in the meantime I can use another more basic template, but this would be helpful). In particular, missingfootnotebackref.sty
is the first error message that I get.
Is there a way to create an image that uses a more full version of texlive? I looked through the alpine latex dockerfile and couldn't find an easy way to change from texlive
to texlive-full
(or something along those lines).
$ docker inspect pandoc/core:2.6 --format='Pandoc Version: {{ index .Config.Labels "org.pandoc.version" }}'
Pandoc Version:
I think it's because this line
Line 50 in 5073395
happens after a new layer FROM
statement (so the ARG
doesn't apply or maybe even exist?):
Line 47 in 5073395
Potentially half-baked but easy solution would be to maybe just ENV PANDOC_COMMIT=$pandoc_commit
? I'm not sure if that helps. I was trying to learn in the docs how to tell what commit you have in the :edge
case (so that if they need newer commit they can just clone here and build).
What do you think of bundling the various libraries needed for --katex
, --mathjax
, --to=slideous
, --to=revealjs
etc. to be able to produce --self-contained
HTML files without having to download anything extra?
I know this is debatable, could also ask why not bundle them with the pandoc binary/installer releases...
But the pandoc/latex
container already took the step of bundling a huge dependency, which is very convenient (thanks! ๐). So in same spirit, would you like PR(s) to add those?
This of course opens a can of worms of which version to bundle... Latest known compatible at time image was built?
need to track down licenses... Some of these like Slidy and S5 date to the early web and don't have modern code repositories, at best a .zip.
estimate total size to decide whether these are negligible, or justify a separate pandoc/html
image to keep pandoc/core
small?
The docker run
commands in the README are leaking stopped containers (visible with docker ps -a
).
docker run --rm
would fix the issue.
the docker site for the two images looks a bit ... empty.
If I use --highlight-style=tango I get error below. Adding the "framed" package fixed the issue.
Error producing PDF.
! LaTeX Error: File `framed.sty' not found.
Type X to quit or to proceed,
or enter new name. (Default extension: sty)
Enter file name:
! Emergency stop.
<read *>
l.58 \definecolor
The LaTeX image is very large, currently more than 2 GB. LaTeX requires a lot of space, but it should be possible to reduce the size to less than half that value.
I am trying to use the LaTex Docker container on a file containing Hebrew text. The following is the result:
/usr/bin/docker run --name pandoclatex2911_da18a6 --label 488dfb --workdir /github/workspace --rm -e INPUT_ARGS -e HOME -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_ACTOR -e GITHUB_WORKFLOW -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GITHUB_EVENT_NAME -e GITHUB_WORKSPACE -e GITHUB_ACTION -e GITHUB_EVENT_PATH -e RUNNER_OS -e RUNNER_TOOL_CACHE -e RUNNER_TEMP -e RUNNER_WORKSPACE -e ACTIONS_RUNTIME_URL -e ACTIONS_RUNTIME_TOKEN -e GITHUB_ACTIONS=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/tidepool-hebrew-docs/tidepool-hebrew-docs":"/github/workspace" pandoc/latex:2.9.1.1 --standalone -o output.pdf --pdf-engine xelatex docs/test.md
Error producing PDF.
! LaTeX Error: File `zref-abspage.sty' not found.Type X to quit or to proceed,
or enter new name. (Default extension: sty)Enter file name:
! Emergency stop.
<read *>l.89 \RequirePackage
##[error]Docker run failed with exit code 43
I am just getting started with Pandoc and unfamiliar with Docker. Am I doing something wrong, or is this a problem with the container?
Ubuntu is very popular as the basis for Docker images, and many people are familiar with it. It would therefor make sense to create alternatives to the current alpine-based images.
Testing the docker containers is difficult. As proposed by @svenevs, we should have at least some smoke-tests to check correctness of our builds.
Tasks that seem important enough to test:
docx
pdf creation via LaTeX
Lua filters which contain calls to a system-provided library
Once #112 lands we will not be able to build anything prior to 2.10 regardless of PANDOC_VERSION
(since alpine is getting upgraded to 3.12 and test suite updates).
Probably the easiest thing to do is start tagging so we have a track record.
Minimal .gitlab-ci.yml
:
image: docker:latest
services:
- docker:dind
build:
script:
- docker run --rm --volume "`pwd`:/data" --user `id -u`:`id -g` pandoc/latex README.md -o README.pdf
artifacts:
paths:
- "*.pdf"
dockerfiles/alpine/latex/Dockerfile
Line 41 in ab8be50
Check whether or not that can all be removed in favor of tlmgr
install. Prioritizing re-release (#43) but they may have made an official solution now that the repositories are not frozen.
There's a couple ways to fix this, will resolve after #24 (next PR will be release=2.6 so that latex images get fixed, but they need an edge
build first).
Continuation of discussion here
I very strongly feel that a base pandoc + minimal latex docker image should be officially maintained (and am happy to help maintain it!). I just don't know how to build out the layering scheme...
I've spent a lot more time on this than I originally intended, and have come to the conclusion that maintenance of a pandoc + latex image is neither worth the pain nor the effort. The real meal ticket here is that these docker files have an easy way of getting pandoc
. Here's why
The install-tl
approach is much slower with minimal gain in my opinion. The basic
scheme for latex is missing a lot that I'm pretty sure will affect pandoc
output. I didn't test on simple documents, but for a local repo here with -t beamer
I ended up needing to install about 20 additional packages before I could compile the slides. I did bring in some fancy stuff (like minted
), but by and large this will be every user's experience.
Worst, I think I found a bug that I couldn't actually solve on alpine. When you tlmgr install biber
there is no biber
executable installed. It installed biblatex
, bibtex
, all of the other ones. But reliably no biber
, which will be a big hurdle for people using latexmk + pandoc-citeproc
I think. At least it was for me.
Reference point: image took 6 minutes 38 seconds to build on top of tarleb/alpine-pandoc:2.5
, producing an uncompressed image size of 622MB.
apk add
When trying to solve biber
problem I noticed that it was available from apk
, and I tested two things since mixing the install-tl
and apk add biber
didn't work (install-tl
produced biblatex
too new for packaged biber
):
FROM tarleb/alpine-pandoc:2.5
# Version 1:
RUN apk add --no-cache texlive
# Version 2: basically texlive-full minus texlive-doc minus texlive-dev
# see: https://pkgs.alpinelinux.org/package/edge/community/x86_64/texlive-full
RUN apk add --no-cache texlive texlive-dvi texlive-luatex texlive-xetex xdvik
With docker system prune -a
in between all of these tests:
Alpine packaged latex very well, you still get tlmgr
. Similarly, the basic
install for install-tl
is so stripped down that people will be forced to encounter the world of dependency satisfaction with LaTeX. Noting that tlmgr install XXX
may only be part of it, things like images typically need other libraries.
Either way (622MB or 814MB), these images are getting big enough that they will be cumbersome to download.
Solution: provide suggestions in the README for what people need to do to get texlive. The path of least resistance is to use system package managers. Sometimes system package managers add too much stuff, but it's going to be stable, and downloads will actually be faster than install-tl
.
To be honest I'm pretty dissatisfied with these results, but the upshot is that the layering scheme becomes transparent again: only provide pandoc/distro
images. pandoc/alpine
is just alpine
plus pandoc
, nice and simple.
This issue can be closed or discussed. If people agree with abandoning pandoc + latex images, I can add a pull request with documentation on the README with suggested approach for installing latex in their own container.
It might be worth thinking about including pandoc-crossref as well as pandoc-citeproc.
It is very widely used, and provides some features that are essential for certain kinds of writing (e.g. academic). Unfortunately it's another fairly big binary. Note, however, that by using dynamic linking you could reduce binary code size duplication between pandoc, pandoc-citeproc, and pandoc-crossref, because they could all share the pandoc library.
In my app which uses the pandoc/latex image, I've noticed new builds are failing tests in my CI.
EXCEPTION: Error producing PDF.
! LaTeX Error: File `pdftexcmds.sty' not found.
Type X to quit or <RETURN> to proceed, or enter new name. (Default extension: sty)
Enter file name:
! Emergency stop.
<read *>
l.490 \RequirePackage{pdftexcmds}[2007/11/11]
Locally tests are still passing but I have a 4 month old cached pandox/latex image. I thought recent builds were failing because the pdftexcmds package for alpine were no longer available, but I think it may be because the way I'm bringing texlive into my image is no longer compatible with recent changes to pandoc/latex. Here's how I bring in pandoc and texlive:
FROM pandoc/latex AS pandoc
FROM ruby:2.6.3-alpine3.10
COPY --from=pandoc /usr/bin/pandoc* /usr/bin/
COPY --from=pandoc /opt/texlive/* /usr/bin/texlive/
RUN ln -s /usr/bin/texlive/bin/x86_64-linuxmusl/latex /usr/bin/pdflatex
...
Is there something obviously wrong with that?
See #51 (comment). I have a fork going at https://github.com/znmeb/dockerfiles. I'll add an issues
tab shortly.
The latest Alpine release features GHC 8.8 and cabal 3.2. However, this will prevent us from building older pandoc versions. "Older" in this case is everything before 2.9.2.
SVG images in latex based output (e.g. beamer) do not show proper text. The culprit is rsvg-convert
which generates bad png/pdf output.
To reproduce the bug:
$ docker run --rm --volume "`pwd`:/data" --entrypoint "/bin/sh" -ti pandoc/latex
/data # rsvg-convert -f png -a -o font-test.png font-test.svg
/data # rsvg-convert -f pdf -a -o font-test.pdf font-test.svg
using the SVG file in font-test.svg.zip.
I suspect that there are some missing fonts in the image, but I don't know enough about Librsvg to trace down the real issue.
This is a rather serious bug which prevents the use of Pandoc to generate both web and PDF contents unless scalable images are converted in advance to a non scalable png.
The Dockerfile for alpine does very little caching when building pandoc. Most notably, the Haskell dependencies should be cached, so they won't have to be build every time.
Most docker images are tagged in a way that by pulling the 2.7
tag I would get the latest compatible 2.7.x
version, at the moment this would be 2.7.3
.
Biber is an integral part of many LaTeX pipelines. Including it in the images seems reasonable.
There is currently no test verifying that lualatex works.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.