Please help Ukraine!
Sponsor
Pandoc   a universal document converter

Creating Custom Pandoc Writers in Lua

Introduction

If you need to render a format not already handled by pandoc, or you want to change how pandoc renders a format, you can create a custom writer using the Lua language. Pandoc has a built-in Lua interpreter, so you needn’t install any additional software to do this.

A custom writer is a Lua file that defines how to render the document. Two styles of custom writers are supported: classic custom writers must define rendering functions for each AST element. New style writers, available since pandoc 2.17.2, must define just a single function Writer, which gets passed the document and writer options, and then does all rendering.

Classic style

A writer using the classic style defines rendering functions for each element of the pandoc AST.

For example,

function Para(s)
  return "<paragraph>" .. s .. "</paragraph>"
end

The best way to go about creating a classic custom writer is to modify the example that comes with pandoc. To get the example, you can do

pandoc --print-default-data-file sample.lua > sample.lua

A custom HTML writer

sample.lua is a full-features HTML writer, with explanatory comments. To use it, just use the path to the custom writer as the writer name:

pandoc -t sample.lua myfile.md

sample.lua defines all the functions needed by any custom writer, so you can design your own custom writer by modifying the functions in sample.lua according to your needs.

-- This is a sample custom writer for pandoc.  It produces output
-- that is very similar to that of pandoc's HTML writer.
-- There is one new feature: code blocks marked with class 'dot'
-- are piped through graphviz and images are included in the HTML
-- output using 'data:' URLs. The image format can be controlled
-- via the `image_format` metadata field.
--
-- Invoke with: pandoc -t sample.lua
--
-- Note:  you need not have lua installed on your system to use this
-- custom writer.  However, if you do have lua installed, you can
-- use it to test changes to the script.  'lua sample.lua' will
-- produce informative error messages if your code contains
-- syntax errors.

local pipe = pandoc.pipe
local stringify = (require 'pandoc.utils').stringify

-- The global variable PANDOC_DOCUMENT contains the full AST of
-- the document which is going to be written. It can be used to
-- configure the writer.
local meta = PANDOC_DOCUMENT.meta

-- Choose the image format based on the value of the
-- `image_format` meta value.
local image_format = meta.image_format
  and stringify(meta.image_format)
  or 'png'
local image_mime_type = ({
    jpeg = 'image/jpeg',
    jpg = 'image/jpeg',
    gif = 'image/gif',
    png = 'image/png',
    svg = 'image/svg+xml',
  })[image_format]
  or error('unsupported image format `' .. image_format .. '`')

-- Character escaping
local function escape(s, in_attribute)
  return s:gsub('[<>&"\']',
    function(x)
      if x == '<' then
        return '&lt;'
      elseif x == '>' then
        return '&gt;'
      elseif x == '&' then
        return '&amp;'
      elseif in_attribute and x == '"' then
        return '&quot;'
      elseif in_attribute and x == "'" then
        return '&#39;'
      else
        return x
      end
    end)
end

-- Helper function to convert an attributes table into
-- a string that can be put into HTML tags.
local function attributes(attr)
  local attr_table = {}
  for x,y in pairs(attr) do
    if y and y ~= '' then
      table.insert(attr_table, ' ' .. x .. '="' .. escape(y,true) .. '"')
    end
  end
  return table.concat(attr_table)
end

-- Table to store footnotes, so they can be included at the end.
local notes = {}

-- Blocksep is used to separate block elements.
function Blocksep()
  return '\n\n'
end

-- This function is called once for the whole document. Parameters:
-- body is a string, metadata is a table, variables is a table.
-- This gives you a fragment.  You could use the metadata table to
-- fill variables in a custom lua template.  Or, pass `--template=...`
-- to pandoc, and pandoc will do the template processing as usual.
function Doc(body, metadata, variables)
  local buffer = {}
  local function add(s)
    table.insert(buffer, s)
  end
  add(body)
  if #notes > 0 then
    add('<ol class="footnotes">')
    for _,note in pairs(notes) do
      add(note)
    end
    add('</ol>')
  end
  return table.concat(buffer,'\n') .. '\n'
end

-- The functions that follow render corresponding pandoc elements.
-- s is always a string, attr is always a table of attributes, and
-- items is always an array of strings (the items in a list).
-- Comments indicate the types of other variables.

function Str(s)
  return escape(s)
end

function Space()
  return ' '
end

function SoftBreak()
  return '\n'
end

function LineBreak()
  return '<br/>'
end

function Emph(s)
  return '<em>' .. s .. '</em>'
end

function Strong(s)
  return '<strong>' .. s .. '</strong>'
end

function Subscript(s)
  return '<sub>' .. s .. '</sub>'
end

function Superscript(s)
  return '<sup>' .. s .. '</sup>'
end

function SmallCaps(s)
  return '<span style="font-variant: small-caps;">' .. s .. '</span>'
end

function Strikeout(s)
  return '<del>' .. s .. '</del>'
end

function Link(s, tgt, tit, attr)
  return '<a href="' .. escape(tgt,true) .. '" title="' ..
         escape(tit,true) .. '"' .. attributes(attr) .. '>' .. s .. '</a>'
end

function Image(s, src, tit, attr)
  return '<img src="' .. escape(src,true) .. '" title="' ..
         escape(tit,true) .. '"/>'
end

function Code(s, attr)
  return '<code' .. attributes(attr) .. '>' .. escape(s) .. '</code>'
end

function InlineMath(s)
  return '\\(' .. escape(s) .. '\\)'
end

function DisplayMath(s)
  return '\\[' .. escape(s) .. '\\]'
end

function SingleQuoted(s)
  return '&lsquo;' .. s .. '&rsquo;'
end

function DoubleQuoted(s)
  return '&ldquo;' .. s .. '&rdquo;'
end

function Note(s)
  local num = #notes + 1
  -- insert the back reference right before the final closing tag.
  s = string.gsub(s,
          '(.*)</', '%1 <a href="#fnref' .. num ..  '">&#8617;</a></')
  -- add a list item with the note to the note table.
  table.insert(notes, '<li id="fn' .. num .. '">' .. s .. '</li>')
  -- return the footnote reference, linked to the note.
  return '<a id="fnref' .. num .. '" href="#fn' .. num ..
            '"><sup>' .. num .. '</sup></a>'
end

function Span(s, attr)
  return '<span' .. attributes(attr) .. '>' .. s .. '</span>'
end

function RawInline(format, str)
  if format == 'html' then
    return str
  else
    return ''
  end
end

function Cite(s, cs)
  local ids = {}
  for _,cit in ipairs(cs) do
    table.insert(ids, cit.citationId)
  end
  return '<span class="cite" data-citation-ids="' .. table.concat(ids, ',') ..
    '">' .. s .. '</span>'
end

function Plain(s)
  return s
end

function Para(s)
  return '<p>' .. s .. '</p>'
end

-- lev is an integer, the header level.
function Header(lev, s, attr)
  return '<h' .. lev .. attributes(attr) ..  '>' .. s .. '</h' .. lev .. '>'
end

function BlockQuote(s)
  return '<blockquote>\n' .. s .. '\n</blockquote>'
end

function HorizontalRule()
  return "<hr/>"
end

function LineBlock(ls)
  return '<div style="white-space: pre-line;">' .. table.concat(ls, '\n') ..
         '</div>'
end

function CodeBlock(s, attr)
  -- If code block has class 'dot', pipe the contents through dot
  -- and base64, and include the base64-encoded png as a data: URL.
  if attr.class and string.match(' ' .. attr.class .. ' ',' dot ') then
    local img = pipe('base64', {}, pipe('dot', {'-T' .. image_format}, s))
    return '<img src="data:' .. image_mime_type .. ';base64,' .. img .. '"/>'
  -- otherwise treat as code (one could pipe through a highlighter)
  else
    return '<pre><code' .. attributes(attr) .. '>' .. escape(s) ..
           '</code></pre>'
  end
end

function BulletList(items)
  local buffer = {}
  for _, item in pairs(items) do
    table.insert(buffer, '<li>' .. item .. '</li>')
  end
  return '<ul>\n' .. table.concat(buffer, '\n') .. '\n</ul>'
end

function OrderedList(items)
  local buffer = {}
  for _, item in pairs(items) do
    table.insert(buffer, '<li>' .. item .. '</li>')
  end
  return '<ol>\n' .. table.concat(buffer, '\n') .. '\n</ol>'
end

function DefinitionList(items)
  local buffer = {}
  for _,item in pairs(items) do
    local k, v = next(item)
    table.insert(buffer, '<dt>' .. k .. '</dt>\n<dd>' ..
                   table.concat(v, '</dd>\n<dd>') .. '</dd>')
  end
  return '<dl>\n' .. table.concat(buffer, '\n') .. '\n</dl>'
end

-- Convert pandoc alignment to something HTML can use.
-- align is AlignLeft, AlignRight, AlignCenter, or AlignDefault.
local function html_align(align)
  if align == 'AlignLeft' then
    return 'left'
  elseif align == 'AlignRight' then
    return 'right'
  elseif align == 'AlignCenter' then
    return 'center'
  else
    return 'left'
  end
end

function CaptionedImage(src, tit, caption, attr)
  if #caption == 0 then
    return '<p><img src="' .. escape(src,true) .. '" id="' .. attr.id ..
      '"/></p>'
  else
    local ecaption = escape(caption)
    return '<figure>\n<img src="' .. escape(src,true) ..
        '" id="' .. attr.id .. '" alt="' .. ecaption  .. '"/>' ..
        '<figcaption>' .. ecaption .. '</figcaption>\n</figure>'
  end
end

-- Caption is a string, aligns is an array of strings,
-- widths is an array of floats, headers is an array of
-- strings, rows is an array of arrays of strings.
function Table(caption, aligns, widths, headers, rows)
  local buffer = {}
  local function add(s)
    table.insert(buffer, s)
  end
  add('<table>')
  if caption ~= '' then
    add('<caption>' .. escape(caption) .. '</caption>')
  end
  if widths and widths[1] ~= 0 then
    for _, w in pairs(widths) do
      add('<col width="' .. string.format('%.0f%%', w * 100) .. '" />')
    end
  end
  local header_row = {}
  local empty_header = true
  for i, h in pairs(headers) do
    local align = html_align(aligns[i])
    table.insert(header_row,'<th align="' .. align .. '">' .. h .. '</th>')
    empty_header = empty_header and h == ''
  end
  if not empty_header then
    add('<tr class="header">')
    for _,h in pairs(header_row) do
      add(h)
    end
    add('</tr>')
  end
  local class = 'even'
  for _, row in pairs(rows) do
    class = (class == 'even' and 'odd') or 'even'
    add('<tr class="' .. class .. '">')
    for i,c in pairs(row) do
      add('<td align="' .. html_align(aligns[i]) .. '">' .. c .. '</td>')
    end
    add('</tr>')
  end
  add('</table>')
  return table.concat(buffer,'\n')
end

function RawBlock(format, str)
  if format == 'html' then
    return str
  else
    return ''
  end
end

function Div(s, attr)
  return '<div' .. attributes(attr) .. '>\n' .. s .. '</div>'
end

-- The following code will produce runtime warnings when you haven't defined
-- all of the functions you need for the custom writer, so it's useful
-- to include when you're working on a writer.
local meta = {}
meta.__index =
  function(_, key)
    io.stderr:write(string.format("WARNING: Undefined function '%s'\n",key))
    return function() return '' end
  end
setmetatable(_G, meta)

Template variables

New template variables can be added, or existing ones modified, by returning a second value from function Doc.

For example, the following will add the current date in variable date, unless date is already defined as either a metadata value or a variable:

function Doc (body, meta, vars)
  vars.date = vars.date or meta.data or os.date '%B %e, %Y'
  return body, vars
end

New style

Custom writers using the new style must contain a global function named Writer. Pandoc calls this function with the document and writer options as arguments, and expects the function to return a string.

function Writer (doc, opts)
  -- ...
end

Example: modified Markdown writer

Writers have access to all modules described in the Lua filters documentation. This includes pandoc.write, which can be used to render a document in a format already supported by pandoc. The document can be modified before this conversion, as demonstrated in the following short example. It renders a document as GitHub Flavored Markdown, but always uses fenced code blocks, never indented code.

function Writer (doc, opts)
  local filter = {
    CodeBlock = function (cb)
      -- only modify if code block has no attributes
      if cb.attr == pandoc.Attr() then
        local delimited = '```\n' .. cb.text .. '\n```'
        return pandoc.RawBlock('markdown', delimited)
      end
    end
  }
  return pandoc.write(doc:walk(filter), 'gfm', opts)
end