Coder Social home page Coder Social logo

pandoc / dockerfiles Goto Github PK

View Code? Open in Web Editor NEW
342.0 14.0 92.0 993 KB

Dockerfiles for various pandoc images

License: GNU General Public License v2.0

Makefile 32.15% Dockerfile 33.42% Shell 29.83% TeX 2.95% Lua 1.65%
pandoc docker-image document-conversion

dockerfiles's People

Contributors

0xr0bert avatar alerque avatar cdivita avatar colindean avatar daamien avatar dinamicoplus avatar guillaumeassier avatar hschwentner avatar jackmcpickle avatar jradek avatar k4zuki avatar maehr avatar maxheld83 avatar mb21 avatar mgred avatar svenevs avatar tarleb avatar trogluddite avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dockerfiles's Issues

How to change font

I'm trying to change the mainfont in the frontmatter of a Markdown file in the pandoc/latex:2.7.3 image using XeLaTeX. But whatever I specify I get following error:

kpathsea: Running mktextfm Euler
/opt/texlive/texdir/texmf-dist/web2c/mktexnam: Could not map source abbreviation  for Euler.
/opt/texlive/texdir/texmf-dist/web2c/mktexnam: Need to update ?

I already ran tlmgr update --self && tlmgr update --all before โ€ฆ

revert smarter entrypoint script by 2.10

Edit from future: see #78 (comment) for reversion deadlines / strategy.

The smarter entrypoint script introduced in 51cb2b6 closing #54 may add some unwelcome complexity.
For example (maybe the only case), @zspitz noticed that the entrypoint script won't let pandoc containers accept pandoc arguments and input files in arbitrary orders (see pandoc/pandoc-action-example#3 and jgm/pandoc-website#38).

@tarleb suggested to improve the auto detection, but I'm wondering whether it would not just make more sense to revert to something simple:

ENTRYPOINT ["pandoc"]
CMD ["--help"]

I know a lot of thought went into #54 so feel free to close this.

My expectation was actually that there'd be no such cleverness, and it surprised me that this could be a source of problems; much like @zspitz I would've never guessed it.

In #54, @tarleb mentioned that:

New users are repeatedly irritated by our use of ENTRYPOINT.

But isn't ENTRYPOINT ["pandoc"] the default for containers?
And users can always override with docker run --entrypoint=/bin/bash pandoc ls, right?

I'm also not sure the helper script in the dockerfile docs apply here:

The ENTRYPOINT instruction can also be used in combination with a helper script, allowing it to function in a similar way to the command above, even when starting the tool may require more than one step.

(emphasis added).

But this (more than one step) isn't the case for pandoc, right?

Internationalized LaTeX

Support for

  • Cyrillic,
  • CJK,
  • Arabic,
  • Farsi,
  • Hebrew,
  • ?

I see three good ways to support these:

  1. Separate images for each of these;
  2. all-in-one "international" LaTeX image;
  3. docs on how to create localized images.

Each has its own pros and cons.

Base repo for pandoc docker projects

Based on the mightymake project, I've been evolving a bit a practical project setup arround make and pandoc/dockerfiles. Ideally, I'd like to contribute it to this github org, but maybe it should receive some additional review / stripdown / polishing before seriously pursuing this idea.

Nevertheless, I want to share here informally:

# ./Makefile
.DEFAULT_GOAL := help


define INFORMATION
Pulishing pipeline using pandoc.
Version 1.0

USAGE:

make <folder>/<file>.<ext>      generate the requested file

make all.html                   generate html version for all sources
make all.pdf                    generate PDF version for all sources
make all.epub                   generate epub version for all sources
make all                        generate all files in all versions

make clean                      erases the output folder
make build                      builds the custom pandoc image for this project
make new.<XYZ>/<docname>        quick create XYZ markdown document from template
endef

export INFORMATION

SHELL = /bin/bash
PANDOCIMAGE=pandoc/pharmaone:latest
DATADIR=.data
OUTPUTDIR=.out
TEMPLATENAME=0000-plantilla
DEFAULTFILE=defaults.yaml

TEMPLATES=$(wildcard $(DATADIR)/templates/$(@D).*)
LETTERHEAD=$(wildcard $(DATADIR)/letterheads/$(@D)/letterhead.png)
LEGALHEAD=$(wildcard $(DATADIR)/letterheads/$(@D)/legalhead.png)
METADATAFILE=$(wildcard $(DATADIR)/metadata/$(@D).yaml)


PANDOCCMD=docker run --rm --volume "`pwd`:/data" --user `id -u`:`id -g`
PANDOCCMD+= $(PANDOCIMAGE)
PANDOCCMD+= --defaults=$(DEFAULTFILE)
PANDOCCMD+= --data-dir=$(DATADIR)
PANDOCCMD+= --resource-path=$(@D)
# PANDOCCMD+= --extract-media=./$(OUTPUTDIR)/$*
PANDOCCMD+= --output=./$(OUTPUTDIR)/$@
PANDOCCMD+= $(if $(LETTERHEAD), --variable=letterhead:$(LETTERHEAD))
PANDOCCMD+= $(if $(LEGALHEAD), --variable=legalhead:$(LEGALHEAD))
PANDOCCMD+= $(if $(METADATAFILE), --metadata-file=$(METADATAFILE))
PANDOCCMD+= $<

CUSTOMLATEXTEMPLATE=$(if $(findstring .latex,$(TEMPLATES)),--template=$(@D))
CUSTOMHTMLTEMPLATE=$(if $(findstring .html,$(TEMPLATES)),--template=$(@D))

SOURCES = $(sort $(wildcard */*.md) $(wildcard */**/*.md))
NEWDOC_PROTO_TARGETS = $(foreach i,$(sort $(dir $(SOURCES))),new.$(i))
PDF = $(SOURCES:.md=.pdf)
TEX = $(SOURCES:.md=.tex)
HTML = $(SOURCES:.md=.html)
EPUB = $(SOURCES:.md=.epub)

PACKAGEREGEX=(?<=^% install{)[\w-_]+(?=}$$)

all : html epub pdf tex
all.tex: $(TEX)
all.pdf: $(PDF)
all.html: $(HTML)
all.epub: $(EPUB)

$(TEX) : %.tex : %.md
	@mkdir -p ./$(OUTPUTDIR)/$(@D)
	$(PANDOCCMD) $(CUSTOMLATEXTEMPLATE) --to=latex

$(PDF) : %.pdf : %.md
	@mkdir -p ./$(OUTPUTDIR)/$(@D)
	$(PANDOCCMD) $(CUSTOMLATEXTEMPLATE) --to=pdf

$(HTML) : %.html : %.md
	@mkdir -p ./$(OUTPUTDIR)/$(@D)
	$(PANDOCCMD) $(CUSTOMHTMLTEMPLATE) --to=html

$(EPUB) : %.epub : %.md
	@mkdir -p ./$(OUTPUTDIR)/$(@D)
	$(PANDOCCMD) $(@:.epub=.md) --to=epub

.PHONY: help build clean $(NEWDOC_TARGETS)
help:; @echo "$$INFORMATION"
build:; docker build --build-arg "packages=$$(cat $(DATADIR)/templates/*.latex | grep -Po "$(PACKAGEREGEX)" | paste -sd ' ' -)" -t $(PANDOCIMAGE) .
clean:; rm -rf ./$(OUTPUTDIR)
$(NEWDOC_PROTO_TARGETS): # just to make stem autocompletion work
NEWDOC_TARGETS = $(foreach i,$(NEWDOC_PROTO_TARGETS),$(i)%)
NEWDIR=$(subst new.,,$(dir $@))
NEWFOLDER=$(NEWDIR)$*
NEWMD=$(NEWFOLDER).md
TMPLFOLDER=$(NEWDIR)$(TEMPLATENAME)
TMPLMD=$(TMPLFOLDER).md
$(NEWDOC_TARGETS):
	@echo "Bootstrap $(NEWMD) from $(TMPLMD) ..."
	@if [ ! -d "$(TMPLFOLDER)" ]; then echo "$(TMPLFOLDER) template folder doesn't exist..."; fi
	@if [ ! -f "$(TMPLMD)" ]; then echo "$(TMPLMD) template doesn't exist..."; fi
	@if [ -d "$(NEWFOLDER)" ]; then echo "$(NEWFOLDER) folder alread exists..."; elif [ -d "$(TMPLFOLDER)" ]; then cp -r $(TMPLFOLDER) $(NEWFOLDER); fi
	@if [ -f "$(NEWMD)" ]; then echo "$(NEWMD) alread exists...";  elif [ -f "$(TMPLMD)" ]; then cp $(TMPLMD) $(NEWMD); fi

# Makefile extensions
# ./Dockerfile
# NOTE: drop when merged https://github.com/pandoc/dockerfiles/pull/50
FROM pandoc/latex:2.9.2.1
RUN wget -O- https://github.com/lierdakil/pandoc-crossref/releases/download/v0.3.6.2a/pandoc-crossref-Linux-2.9.2.1.tar.xz | tar -xJ -C/usr/bin/ pandoc-crossref
ARG packages=
RUN tlmgr update --self
RUN [ -z "$packages" ] && echo "No packages to install" || tlmgr install ${packages}
# ./defaults.yaml
from: markdown+simple_tables+table_captions+yaml_metadata_block+smart
# reader: may be used instead of from:
# to: pdf
# writer: may be used instead of to:

# leave blank for output to stdout:
output-file:
# leave blank for input from stdin, use [] for no input:
input-files:
# or you may use input-file: with a single value

template: default
standalone: true
self-contained: false

# note that structured variables may be specified:
variables:
  lof: true
  lot: true
  graphics: true
  geometry:
  - scale=0.6
  - ignorefoot=true
  - footskip=25mm
  # - top=30mm
  # - left=20mm
  - centering
  - vcentering
  pagestyle: fancy
  textcolor: black!25!darkgray
  # mainfont: DejaVuSerif.ttf
  mainfont: DejaVuSans.ttf
  sansfont: DejaVuSans.ttf
  monofont: DejaVuSansMono.ttf
  colorlinks: true
  # letterhead: ".data/letterheads/default.png"
  # legalhead: ".data/letterheads/default.png"
  # documentclass: book
  # classoption:
  # - twosides
  # - draft

# metadata values specified here are parsed as literal
# string text, not markdown:
metadata:
  lang: es-CO
# metadata-files:
# - .data/metadata/default.yaml
# or you may use metadata-file: with a single value

# Note that these take files, not their contents:
include-before-body: []
include-after-body: []
include-in-header: []
resource-path: ["."]

# filters will be assumed to be Lua filters if they have
# the .lua extension, and json filters otherwise.  But
# the filter type can also be specified explicitly, as shown:
filters:
- pandoc-crossref
- pandoc-citeproc
- abstract-to-meta.lua

file-scope: true

# ERROR, WARNING, or INFO
verbosity: INFO
# log-file: log.json

# citeproc, natbib, or biblatex
cite-method: citeproc
# part, chapter, section, or default:
top-level-division: default
abbreviations:

pdf-engine: xelatex
# pdf-engine-opts:
# - "-shell-escape"
# you may also use pdf-engine-opt: with a single option
# pdf-engine-opt: "-shell-escape"

# auto, preserve, or none
wrap: auto
columns: 78
dpi: 72

table-of-contents: true
toc-depth: 3
number-sections: false
# a list of offsets at each heading level
number-offset: [0,0,0,0,0,0]
# toc: may also be used instead of table-of-contents:
shift-heading-level-by: 1
section-divs: true
identifier-prefix: foo
title-prefix: ""
strip-empty-paragraphs: true
# lf, crlf, or native
eol: lf
strip-comments: false
indented-code-classes: []
ascii: true
default-image-extension: ".png"

# either a style name of a style definition file:
highlight-style: pygments
# syntax-definitions:
# or you may use syntax-definition: with a single value
listings: false

# method is plain, webtex, gladtex, mathml, mathjax, katex
# you may specify a url with webtex, mathjax, katex
html-math-method:
  method: mathjax
  url: "https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"
# none, references, or javascript
email-obfuscation: javascript

tab-stop: 8
preserve-tabs: true

incremental: false
slide-level: 2

reference-links: true
# block, section, or document
reference-location: block
atx-headers: false

# accept, reject, or all
track-changes: accept

html-q-tags: false
# css:
# - site.css

# none, all, or best
ipynb-output: best

# A list of two-element lists
request-headers:
- ["User-Agent", "Mozilla/5.0"]

fail-if-warnings: false
dump-args: false
ignore-args: false
trace: false
# ./.gitignore
.out
.local  # If you want a place to keep local documents away from your collegues' eyes
# ./.data folder structure
./filters
./letterheads
./metadata
  ./my-doc-type.yaml  # merged metadata for all docs in a toplevel subfolder 'my-doc-type'
./templates
  ./default.latex  # including background letterheads over standard template

in order to hint to make which latex packages to install, I just use a specially formatted comment like so:

% install{dejavu}
% install{dejavu-otf}

-> make build

Tex Live is still in 2019 version on published images

TL;DR

Dockers images have been built with an outdated cache. Tex Live is still in 2019 version

You need to rebuilt image with --no-cache in docker command.

Problem

Tex Live is in 2019 version on latest/edge images and it's colliding with CTAN repo

โ–ถ docker run --rm pandoc/latex:latest tlmgr --version
tlmgr revision 53842 (2020-02-19 08:28:40 +0100)
tlmgr using installation: /opt/texlive/texdir
TeX Live (http://tug.org/texlive) version 2019

Resolution

I have rebuilt the pandoc/latex image in local and the version of Tex Live 2020.

โ–ถ docker run --rm pandoc/latex:edge tlmgr --version  
tlmgr revision 54446 (2020-03-21 17:45:22 +0100)
tlmgr using installation: /opt/texlive/texdir
TeX Live (https://tug.org/texlive) version 2020

Impacts

If you try to update tlmgr

FROM pandoc/latex:latest

RUN tlmgr option repository ctan && \
    tlmgr update --self && \
    tlmgr update --all

You'll end up with this error :

tlmgr: Local TeX Live (2019) is older than remote repository (2020).
Cross release updates are only supported with
  update-tlmgr-latest(.sh/.exe) --update
See https://tug.org/texlive/upgrade.html for details.

tlmgr: Remote repository is newer than local (2018 < 2019)

I'm building my own pandoc image based on pandoc/latex:2.7.2. Since a few days, tlmgr installations fail for me:

2.7.2: Pulling from pandoc/latex
bdf0201b3a05: Pull complete 
...
a0d8d5c5eda2: Pull complete 
Digest: sha256:f2c2300b5af6da69bec4dcb9f54c6a68084b4e30fbc12fb2bb1a9f706308a805
Status: Downloaded newer image for pandoc/latex:2.7.2
 ---> 40fade551d61
Step 2/2 : RUN tlmgr install adjustbox collectbox pagecolor mdframed needspace sourcesanspro ly1 mweights sourcecodepro titling framed
 ---> Running in d01d50d252ba

tlmgr: Remote repository is newer than local (2018 < 2019)
Cross release updates are only supported with
  update-tlmgr-latest(.sh/.exe) --update
Please see https://tug.org/texlive/upgrade.html for details.

I can't find update-tlmgr-latest(.sh/.exe) --- any hints on how to make tlmgr work again?

pandoc: sh: openBinaryFile: does not exist (No such file or directory)

I am getting this error:

pandoc: sh: openBinaryFile: does not exist (No such file or directory)

when trying to build some assets with Pandoc in a Gitlab CI bash script.

I have a repo, Finnito/Science, that is serving a Gitlab Pages site using Hugo. I am trying to set up a Gitlab CI pipeline to build my HTML slides and PDFs docs from my Markdown source when I commit to the repo so that I don't have to build them locally.

I have been trying out different Docker images of pandoc but decided pandoc/latex is my best bet because it's official and built on Alpine which is nice and lightweight. But I can't seem to make heads or tails of this error.

I have tried various different incantations for pandoc but they don't seem to work.

My Gitlab CI job looks like this:

assets:
  image: pandoc/latex
  script:
    - chmod +x ci-build.sh
    - sh ci-build.sh

and my ci-build.sh script looks like this:

#!/bin/sh

modulesToBuild=(
    "/builds/Finnito/science/content/10sci/5-fire-and-fuels"
    "/builds/Finnito/science/content/10scie/6-geology"
    "/builds/Finnito/science/content/11sci/4-mechanics"
    "/builds/Finnito/science/content/11sci/5-genetics"
    "/builds/Finnito/science/content/12phy/2-mechanics"
    "/builds/Finnito/science/content/12phy/3-electricity"
)

for i in "${modulesToBuild[@]}"; do

    # Navigate to the directory.
    cd $i

    # Build the HTML slides and
    # PDFs for all markdown docs.
    for filename in markdown/*.md; do
        file=${filename##*/}
        name=${file%%.*}

        pandoc/latex pandoc -s --mathjax -i -t revealjs "markdown/$name.md" -o "$name.html"
        pandoc/latex pandoc "markdown/$name.md" -o "$name.pdf" --pdf-engine=pdflatex
    done
done

Honestly, I'm just pretty lost with how to successfully call pandoc within the Docker container. I am very new to this and it all makes very little sense!

I have managed to do this:

docker run -it pandoc/latex
docker exec -it 6d3e05c5aad3 /bin/sh
ls /usr/bin

and verify that pandoc is located there. I am also able to do pandoc --help when in the shell like this.

I also tried to call it like this, but got the same error:

/usr/bin/pandoc -s --mathjax -i -t revealjs "markdown/$name.md" -o "$name.html"

Any help would be most appreciated!

Can't run shell

I'm trying to do some basic troubleshooting and I'm finding I can't run shell. I haven't had that issue with any other alpine image.

docker run pandoc/core /bin/sh is erroring with:

UTF-8 decoding error in /bin/sh at byte offset 25 (d7).
The input must be a UTF-8 encoded text.

As an aside, the issue I'm trying to troubleshoot is how to pull the latex installation into a multibuild image. This works to pull in pandoc:

COPY --from=pandoc /usr/bin/pandoc* /usr/bin/

But this doesn't appear to pull in anything:

COPY --from=pandoc /opt/texlive/texdir* /usr/bin/

Add wkhtmltopdf

I'd like to convert a html file to pdf, using wkhtmltopdf.

What would be the correct way to add it?

Should it be in the latex image or rather in a completely new one?

libyaml no longer needed

Pandoc 2.8 and later use HsYAML for YAML processing, removing the dependency on libyaml as an external library. We currently install it for all pandoc versions, but should not do so for versions >= 2.8.

Add CI deployment tagging strategy on GitHub Actions

  • Tagging to :latest if push event to master with release=* message.
    • Alpine: repo mirror strategy (alpine-core => core, etc).
  • Fix deadline: for 2.10 releases (on hold re: crossref, also avoid 2.9.2.1 conflicts with latex archive dilemma).

Tectonic

Ever since it has been included as pandoc tex rendering engine, it might be desirable to implement a tectonic pandoc dockerfile variant. Lukily enough, there is a quite recent alpine package.

I haven't tried tectonic, yet, but some of it's choice promise less pain down the road. I'm copying:

  • automatically downloads support files
  • completely reproducible document compiles
  • doesnโ€™t write TeXโ€™s intermediate files (speed!!!)
  • modern OpenType fonts and is fully Unicode-enabled (xetex)
  • completely self-contained library
  • developed in the open in rust

Feature Request: Infrastructure for installing filters -> new bigger image?

Hi,
the dockerized pandoc is great. However one powerful feature, (lua-)filters, is difficult to get running when dockerized, since the filters have different requirements.
Especially when writing in an academic context, filters are necessary for

  • cross referencing
  • figure conversion
  • math stuff

A perfect solution would be to only use lua-filters, because pandoc has builtin support for running these. But maybe it is not possible to do all the things with pure lua. Further, existing filters need to be rewritten, which seems like a waste of resources.

Having a look at this curated list https://github.com/jgm/pandoc/wiki/Pandoc-Filters some really useful filters require

So what would be an appropriate solution for this:

  1. Should it be solved in this repository, e.g. by providing a third image on top of the latex one?
  2. This is out of scope of this repository and enduser have to do this on their own.

Because I really need filters, especially for cross referencing, I started to build a Docker file for case 2. I tried to get at least python running, because this addresses most filters. Installing python in the latex image its easy ... but not the filters. E.g. installing one of the pandoc-*nos filter requires a compilation step

pip install pandoc-fignos
  ...
  running install
  running build
  running build_py
  creating build
  ...
  creating build/temp.linux-x86_64-3.7/psutil
  gcc -Wno-unused-result -Wsign-compare -DNDEBUG
  ...
  unable to execute 'gcc': No such file or directory
  error: command 'gcc' failed with exit status 1

which fails. So gcc would be necessary and probably more build tools.

Any opinion on this?

JR

openBinaryFile: does not exist (No such file or directory)

There are missing files in the latest docker image, see

$ docker run -it pandoc/latex:2.10 sh
[WARNING] Could not deduce format from file extension 
  Defaulting to markdown
pandoc: sh: openBinaryFile: does not exist (No such file or directory)

pandoc/core:2.9.2.1 docker-entrypoint.sh not found

We use pandoc/core:2.9.2.1 and rely on the docker-entrypoint.sh to convert a .md to a .html file. Lately this command started to fail with command not found and we noticed that this image was updated 3 days ago.
For now we are using pandoc/core:2.9.1.1 as it doesn't have this problem.

docker pull does not default to latest

docker pull pandoc/core

currently yields

Using default tag: latest
Error response from daemon: manifest for pandoc/core:latest not found

docker pull pandoc/core:2.6 works.

Ask: Is there an official Docker build for Pandoc?

My apologies if I'm asking a question in the wrong venue. I'm going to plead non-technical and ask is there an official Pandoc Docker build available in Docker Hub?

I've created a makefile for converting a nested directory of .md files into a PDF, ePUB and Mobi. I'm trying to get Continuous Integration going, but my technical skills are so far out of date that I am not groking what needs to be done. I gather a Docker for the executable and then CircleCI for the cheap builds.

I see there may be an official-ish build either present or in the making. There are 566 Pandoc images in Docker Hub, but I don't know whether to trust or use any of them. I'd rather start with the official build if it's available.

TIA

docker-entrypoint.sh misbehaves when input files are under subdirectories

Hi,

Suppose my source markup is under subdirectory, like this:

/
-- doc-folder
   |--- images
   |--- docs
          |--- test.md

Running pandoc locally, I could of course convert it as follows:

% pwd
/doc-folder
% pandoc docs/test.md
<p>This is test.md</p>

But this doesn't work with pandoc/core container:

% docker run \
        --rm \
        --volume $(pwd):/data \
        --user $(id -u):$(id -g) \
        pandoc/core docs/test.md
/usr/local/bin/docker-entrypoint.sh: exec: line 11: docs/test.md: Permission denied

The problem seems to be related to this part of the docker-entrypoint.sh:

[ -z "$(command -v "${1}")" ]

For some weird reason, command -v in Alpine doesn't return empty output for subdirectories:

/data $ command -v test.md 

/data $ command -v docs/test.md 
docs/test.md

At least on macOS it doesn't behave like this...

Any idea why and how to workaround this...?

Ask: support for Azure Pipelines, besides Github Actions

Hello. I was looking forward to adding pandoc into a pipeline for transforming our documentation into other formats (PDF) automatically. The Github Actions implementation seemed really interesting, but I really want to get it to work within the Azure DevOps/Azure Pipelines environment.

I read on the Microsoft docs on alpine containers that there are some requirements on making this work. Specifically supplying Node.js, some tools and a few references.

Can anyone tell me whether:

  1. The docker image as it currently stands supports these tooling?
  2. Is what I am asking possible in the short term? For example, what steps could I take to customize a container for this purpose?

If the answer is no to all of these, I would kindly request support for Azure Pipelines in the future.
If this feature is possible, I would like to suggest a quickstart section to the docs.

Thank you for considering my request in advace.

Option for more complete texlive?

I'm trying to use pandoc/latex:2.6 to convert markdown to PDF. I had in mind to use the eisvogel template, which requires a fairly complete latex distribution (in the meantime I can use another more basic template, but this would be helpful). In particular, missingfootnotebackref.sty is the first error message that I get.
Is there a way to create an image that uses a more full version of texlive? I looked through the alpine latex dockerfile and couldn't find an easy way to change from texlive to texlive-full (or something along those lines).

pandoc version number not being saved

$ docker inspect pandoc/core:2.6 --format='Pandoc Version: {{ index .Config.Labels "org.pandoc.version" }}'
Pandoc Version:

I think it's because this line

LABEL org.pandoc.version "$pandoc_commit"

happens after a new layer FROM statement (so the ARG doesn't apply or maybe even exist?):

FROM alpine AS alpine-pandoc

Potentially half-baked but easy solution would be to maybe just ENV PANDOC_COMMIT=$pandoc_commit? I'm not sure if that helps. I was trying to learn in the docs how to tell what commit you have in the :edge case (so that if they need newer commit they can just clone here and build).

Bundle JS libraries (KaTeX, reveal.js, etc)?

What do you think of bundling the various libraries needed for --katex, --mathjax, --to=slideous, --to=revealjs etc. to be able to produce --self-contained HTML files without having to download anything extra?

I know this is debatable, could also ask why not bundle them with the pandoc binary/installer releases...
But the pandoc/latex container already took the step of bundling a huge dependency, which is very convenient (thanks! ๐Ÿ™). So in same spirit, would you like PR(s) to add those?

This of course opens a can of worms of which version to bundle... Latest known compatible at time image was built?

  • need to track down licenses... Some of these like Slidy and S5 date to the early web and don't have modern code repositories, at best a .zip.

  • estimate total size to decide whether these are negligible, or justify a separate pandoc/html image to keep pandoc/core small?

framed latex package is missing

If I use --highlight-style=tango I get error below. Adding the "framed" package fixed the issue.

Error producing PDF.
! LaTeX Error: File `framed.sty' not found.

Type X to quit or to proceed,
or enter new name. (Default extension: sty)

Enter file name:
! Emergency stop.
<read *>

l.58 \definecolor

Reduce size of LaTeX image

The LaTeX image is very large, currently more than 2 GB. LaTeX requires a lot of space, but it should be possible to reduce the size to less than half that value.

File `zref-abspage.sty' not found

I am trying to use the LaTex Docker container on a file containing Hebrew text. The following is the result:

/usr/bin/docker run --name pandoclatex2911_da18a6 --label 488dfb --workdir /github/workspace --rm -e INPUT_ARGS -e HOME -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_ACTOR -e GITHUB_WORKFLOW -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GITHUB_EVENT_NAME -e GITHUB_WORKSPACE -e GITHUB_ACTION -e GITHUB_EVENT_PATH -e RUNNER_OS -e RUNNER_TOOL_CACHE -e RUNNER_TEMP -e RUNNER_WORKSPACE -e ACTIONS_RUNTIME_URL -e ACTIONS_RUNTIME_TOKEN -e GITHUB_ACTIONS=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/tidepool-hebrew-docs/tidepool-hebrew-docs":"/github/workspace" pandoc/latex:2.9.1.1 --standalone -o output.pdf --pdf-engine xelatex docs/test.md
Error producing PDF.
! LaTeX Error: File `zref-abspage.sty' not found.

Type X to quit or to proceed,
or enter new name. (Default extension: sty)

Enter file name:
! Emergency stop.
<read *>

l.89 \RequirePackage

##[error]Docker run failed with exit code 43

I am just getting started with Pandoc and unfamiliar with Docker. Am I doing something wrong, or is this a problem with the container?

Ubuntu-based images

Ubuntu is very popular as the basis for Docker images, and many people are familiar with it. It would therefor make sense to create alternatives to the current alpine-based images.

Provides input files and scripts for testing

Testing the docker containers is difficult. As proposed by @svenevs, we should have at least some smoke-tests to check correctness of our builds.

Tasks that seem important enough to test:

  • docx

    • conversion to docx (#21)
    • conversion from docx
  • pdf creation via LaTeX

    • simple doc
    • doc's with tables, images, footnotes, custom margins, equations
    • non-english language docs
    • CJK?
    • Different reference backends
  • Lua filters which contain calls to a system-provided library

start tagging things?

Once #112 lands we will not be able to build anything prior to 2.10 regardless of PANDOC_VERSION (since alpine is getting upgraded to 3.12 and test suite updates).

Probably the easiest thing to do is start tagging so we have a track record.

  • 2.9.2.1: 9df2bb4 (#114 is a problem, we can always rewrite history to inject a commit before 2.10 (next commit))
  • 2.10: 6f636fb

To LaTex Or Not To LaTeX

Continuation of discussion here

I very strongly feel that a base pandoc + minimal latex docker image should be officially maintained (and am happy to help maintain it!). I just don't know how to build out the layering scheme...

I've spent a lot more time on this than I originally intended, and have come to the conclusion that maintenance of a pandoc + latex image is neither worth the pain nor the effort. The real meal ticket here is that these docker files have an easy way of getting pandoc. Here's why

Adventures With Manual Install

The install-tl approach is much slower with minimal gain in my opinion. The basic scheme for latex is missing a lot that I'm pretty sure will affect pandoc output. I didn't test on simple documents, but for a local repo here with -t beamer I ended up needing to install about 20 additional packages before I could compile the slides. I did bring in some fancy stuff (like minted), but by and large this will be every user's experience.

Worst, I think I found a bug that I couldn't actually solve on alpine. When you tlmgr install biber there is no biber executable installed. It installed biblatex, bibtex, all of the other ones. But reliably no biber, which will be a big hurdle for people using latexmk + pandoc-citeproc I think. At least it was for me.

Reference point: image took 6 minutes 38 seconds to build on top of tarleb/alpine-pandoc:2.5, producing an uncompressed image size of 622MB.

Just use apk add

When trying to solve biber problem I noticed that it was available from apk, and I tested two things since mixing the install-tl and apk add biber didn't work (install-tl produced biblatex too new for packaged biber):

FROM tarleb/alpine-pandoc:2.5

# Version 1:
RUN apk add --no-cache texlive

# Version 2: basically texlive-full minus texlive-doc minus texlive-dev
# see: https://pkgs.alpinelinux.org/package/edge/community/x86_64/texlive-full
RUN apk add --no-cache texlive texlive-dvi texlive-luatex texlive-xetex xdvik

With docker system prune -a in between all of these tests:

  • Version 1 takes 2 minutes 38 seconds to build x 814MB uncompressed
  • Version 2 takes 3 minutes 9 seconds to build x 861 MB

Analysis

Alpine packaged latex very well, you still get tlmgr. Similarly, the basic install for install-tl is so stripped down that people will be forced to encounter the world of dependency satisfaction with LaTeX. Noting that tlmgr install XXX may only be part of it, things like images typically need other libraries.

Either way (622MB or 814MB), these images are getting big enough that they will be cumbersome to download.

Solution: provide suggestions in the README for what people need to do to get texlive. The path of least resistance is to use system package managers. Sometimes system package managers add too much stuff, but it's going to be stable, and downloads will actually be faster than install-tl.

To be honest I'm pretty dissatisfied with these results, but the upshot is that the layering scheme becomes transparent again: only provide pandoc/distro images. pandoc/alpine is just alpine plus pandoc, nice and simple.

This issue can be closed or discussed. If people agree with abandoning pandoc + latex images, I can add a pull request with documentation on the README with suggested approach for installing latex in their own container.

Include pandoc-crossref?

It might be worth thinking about including pandoc-crossref as well as pandoc-citeproc.
It is very widely used, and provides some features that are essential for certain kinds of writing (e.g. academic). Unfortunately it's another fairly big binary. Note, however, that by using dynamic linking you could reduce binary code size duplication between pandoc, pandoc-citeproc, and pandoc-crossref, because they could all share the pandoc library.

https://github.com/lierdakil/pandoc-crossref

LaTeX Error: File `pdftexcmds.sty' not found

In my app which uses the pandoc/latex image, I've noticed new builds are failing tests in my CI.

EXCEPTION: Error producing PDF.
! LaTeX Error: File `pdftexcmds.sty' not found.

Type X to quit or <RETURN> to proceed, or enter new name. (Default extension: sty)
Enter file name: 
! Emergency stop.
<read *> 
l.490   \RequirePackage{pdftexcmds}[2007/11/11]

Locally tests are still passing but I have a 4 month old cached pandox/latex image. I thought recent builds were failing because the pdftexcmds package for alpine were no longer available, but I think it may be because the way I'm bringing texlive into my image is no longer compatible with recent changes to pandoc/latex. Here's how I bring in pandoc and texlive:

FROM pandoc/latex AS pandoc

FROM ruby:2.6.3-alpine3.10

COPY --from=pandoc /usr/bin/pandoc* /usr/bin/
COPY --from=pandoc /opt/texlive/* /usr/bin/texlive/
RUN ln -s /usr/bin/texlive/bin/x86_64-linuxmusl/latex /usr/bin/pdflatex

...

Is there something obviously wrong with that?

Upgrade to Alpine 3.12

The latest Alpine release features GHC 8.8 and cabal 3.2. However, this will prevent us from building older pandoc versions. "Older" in this case is everything before 2.9.2.

Documentation

The README should be updated with the following information:

  • description of the images do we provide; (#12)
  • using images to convert documents (#23)
  • building custom images from existing ones (#9)
  • links to other resources (pandoc manual, docker howto?) (#23, #31)
  • license (#12)

SVG conversion broken in pandoc/latex

SVG images in latex based output (e.g. beamer) do not show proper text. The culprit is rsvg-convert which generates bad png/pdf output.

To reproduce the bug:

$ docker run --rm --volume "`pwd`:/data" --entrypoint "/bin/sh" -ti pandoc/latex
/data # rsvg-convert -f png -a -o font-test.png font-test.svg
/data # rsvg-convert -f pdf -a -o font-test.pdf font-test.svg

using the SVG file in font-test.svg.zip.

I suspect that there are some missing fonts in the image, but I don't know enough about Librsvg to trace down the real issue.

This is a rather serious bug which prevents the use of Pandoc to generate both web and PDF contents unless scalable images are converted in advance to a non scalable png.

Improve caching of alpine builder

The Dockerfile for alpine does very little caching when building pandoc. Most notably, the Haskell dependencies should be cached, so they won't have to be build every time.

2.7 tag should point to 2.7.3

Most docker images are tagged in a way that by pulling the 2.7 tag I would get the latest compatible 2.7.x version, at the moment this would be 2.7.3.

Add biber to latex image

Biber is an integral part of many LaTeX pipelines. Including it in the images seems reasonable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.