Repeatable PDF Generation with Docker, Latex and Pandoc

Posted on March 12, 2015 by ivotron

TL;DR: I describe a way to easily and repeatably generate a PDF from a latex or markdown file corresponding to an academic article using Docker and Pandoc.

"Publish or perish" the academic saying goes. This means that, as a researcher, producing papers is at the core of our profession. And usually this is done in a collaborative way. A latex-based environment makes it difficult to collaborate, mainly because latex is a complex piece of software. Ensuring that every collaborator has the same latex environment on their local machine is a cumbersome and time-consuming task. An alternative is to use Overleaf, but I personally prefer to work locally and rely on version-control to manage conflicts. So, in my ideal workflow, collaboration should only depend on having access to the source file and defer the issue of having an homogenous environment to tools.

In other words, my goal is to have a self-contained project, where going from source code to PDF should be a matter of cloning a project, running make and get a PDF file:

git clone https://github.com/ivotron/paper-template.git
cd paper-template
make

Docker

This is where Docker comes into play1. Having a docker image that takes care of all the latex "dependency hell" makes it much easier for collaborators to focus on the content of your paper right away. The only thing you need to install is a docker engine installed, and that's becoming easier and easier everyday.

There are plenty of TexLive images in the docker registry. I prefer this one since it contains all the dependencies for the cls files I usually use, i.e. ACM, IEEE and USENIX (the others might also work but I haven't tried them). In the case that some obscure latex files are missing, they can be passed to the container folder where TexLive expects them to be. For example, using a custom font can be accomplished with:

docker run \
  -v `pwd`/fontfolder:/root/texmf/tex/fonts \
  -v `pwd`/main.tex:/root/main.tex \
  ivotron/texlive pdflatex main.tex

The above mounts the folder of fonts that aren't included in the default ubuntu packages into the texmf folder in root's $HOME, which is where the latex command expects these in. The same can be done for any other latex file missing. In my case I maintain a latex-file repo with conference styles and pass this folder to the container as shown above.

A Paper Template

To make it more concrete, I've created a template project at github containing an example of a USENIX article. Generating a PDF is as easy as:

docker run \
  -v `pwd`/latex-files:/root/texmf/tex/latex \
  -v `pwd`/usenix_template.tex:/root/main.tex \
  -v `pwd`/out:/root/out \
  ivotron/texlive pdflatex -output-directory=/root/out /root/main.tex

The resulting PDF file is placed in the out/ folder.

Pandoc

Not everybody likes to write latex. I personally dislike its verbosity. This is where Pandoc shines. If you're not familiar with Pandoc, it is a "swiss-army knife" of document generation with the goal of separating content from presentation. With it, you can write Markdown and generate anything from books, blogs and slides, to academic articles.

For a concrete example, take a look at the paper template project I mentioned above, it also contains a pandoc-flavored Markdown input file corresponding to a USENIX paper. To generate the PDF file:

docker run \
  -v `pwd`/latex-files:/root/texmf/tex/latex \
  -v `pwd`/usenix_template.md/root/main.md \
  -v `pwd`/out:/root/out \
  ivotron/pandoc /root/main.md -o /root/main.pdf

The docker image referenced above is available in the docker registry. For an example of a more complicated paper with figures, cross-references and other things, take a look at here.


  1. not only makes it easier to repeat the generation of a PDF file, but also the generation of figures and the complete experiments that back the figures can be easily made repeatable. I discuss this in another post.