19 A note on security

  1. Although pandoc itself will not create or modify any files other than those you explicitly ask it create (with the exception of temporary files used in producing PDFs), a filter or custom writer could in principle do anything on your file system. Please audit filters and custom writers very carefully before using them.

  2. Several input formats (including HTML, Org, and RST) support include directives that allow the contents of a file to be included in the output. An untrusted attacker could use these to view the contents of files on the file system. (Using the --sandbox option can protect against this threat.)

  3. Several output formats (including RTF, FB2, HTML with --self-contained, EPUB, Docx, and ODT) will embed encoded or raw images into the output file. An untrusted attacker could exploit this to view the contents of non-image files on the file system. (Using the --sandbox option can protect against this threat, but will also prevent including images in these formats.)

  4. If your application uses pandoc as a Haskell library (rather than shelling out to the executable), it is possible to use it in a mode that fully isolates pandoc from your file system, by running the pandoc operations in the PandocPure monad. See the document Using the pandoc API for more details. (This corresponds to the use of the --sandbox option on the command line.)

  5. Pandoc’s parsers can exhibit pathological performance on some corner cases. It is wise to put any pandoc operations under a timeout, to avoid DOS attacks that exploit these issues. If you are using the pandoc executable, you can add the command line options +RTS -M512M -RTS (for example) to limit the heap size to 512MB. Note that the commonmark parser (including commonmark_x and gfm) is much less vulnerable to pathological performance than the markdown parser, so it is a better choice when processing untrusted input.

  6. The HTML generated by pandoc is not guaranteed to be safe. If raw_html is enabled for the Markdown input, users can inject arbitrary HTML. Even if raw_html is disabled, users can include dangerous content in URLs and attributes. To be safe, you should run all HTML generated from untrusted user input through an HTML sanitizer.