Creating Custom Pandoc Writers in Lua
Introduction
If you need to render a format not already handled by pandoc, or you want to change how pandoc renders a format, you can create a custom writer using the Lua language. Pandoc has a built-in Lua interpreter, so you needn’t install any additional software to do this.
A custom writer is a Lua file that defines how to render the
document. Writers must define just a single function, named either
Writer
or ByteStringWriter
, which gets
passed the document and writer options, and then handles the
conversion of the document, rendering it into a string. This
interface was introduced in pandoc 2.17.2, with ByteString writers
becoming available in pandoc 3.0.
Pandoc also supports “classic” custom writers, where a Lua function must be defined for each AST element type. Classic style writers are deprecated and should be replaced with new-style writers if possible.
Writers
Custom writers using the new style must contain a global
function named Writer
or
ByteStringWriter
. Pandoc calls this function with the
document and writer options as arguments, and expects the function
to return a UTF-8 encoded string.
function Writer (doc, opts)
-- ...
end
Writers that do not return text but binary data should define a
function with name ByteStringWriter
instead. The
function must still return a string, but it does not have to be
UTF-8 encoded and can contain arbitrary binary data.
If both Writer
and ByteStringWriter
functions are defined, then only the Writer
function
will be used.
Format extensions
Writers can be customized through format extensions, such as
smart
, citations
, or
hard_line_breaks
. The global Extensions
table indicates supported extensions with a key. Extensions
enabled by default are assigned a true value, while those that are
supported but disabled are assigned a false value.
Example: A writer with the following global table supports the
extensions smart
, citations
, and
foobar
, with smart
enabled and the
others disabled by default:
Extensions = {
smart = true,
citations = false,
foobar = false
}
The users control extensions as usual, e.g.,
pandoc -t my-writer.lua+citations
. The extensions are
accessible through the writer options’ extensions
field, e.g.:
function Writer (doc, opts)
print(
'The citations extension is',
opts.extensions:includes 'citations' and 'enabled' or 'disabled'
)
-- ...
end
Default template
The default template of a custom writer is defined by the
return value of the global function Template
. Pandoc
uses the default template for rendering when the user has not
specified a template, but invoked with the
-s
/--standalone
flag.
The Template
global can be left undefined, in
which case pandoc will throw an error when it would otherwise use
the default template.
Example: modified Markdown writer
Writers have access to all modules described in the Lua filters
documentation. This includes pandoc.write
, which
can be used to render a document in a format already supported by
pandoc. The document can be modified before this conversion, as
demonstrated in the following short example. It renders a document
as GitHub Flavored Markdown, but always uses fenced code blocks,
never indented code.
function Writer (doc, opts)
local filter = {
CodeBlock = function (cb)
-- only modify if code block has no attributes
if cb.attr == pandoc.Attr() then
local delimited = '```\n' .. cb.text .. '\n```'
return pandoc.RawBlock('markdown', delimited)
end
end
}
return pandoc.write(doc:walk(filter), 'gfm', opts)
end
Template = pandoc.template.default 'gfm'
pandoc.scaffolding.Writer
Reducing boilerplate with
The pandoc.scaffolding.Writer
structure is a
custom writer scaffold that serves to avoid common boilerplate
code when defining a custom writer. The object can be used as a
function and allows to skip details like metadata and template
handling, requiring only the render functions for each AST element
type.
The value of pandoc.scaffolding.Writer
is a
function that should usually be assigned to the global
Writer
:
Writer = pandoc.scaffolding.Writer
The render functions for Block and Inline values can then be
added to Writer.Block
and Writer.Inline
,
respectively. The functions are passed the element and the
WriterOptions.
Writer.Inline.Str = function (str)
return str.text
end
Writer.Inline.SoftBreak = function (_, opts)
return opts.wrap_text == "wrap-preserve"
and cr
or space
end
Writer.Inline.LineBreak = cr
Writer.Block.Para = function (para)
return {Writer.Inlines(para.content), pandoc.layout.blankline}
end
The render functions must return a string, a pandoc.layout
Doc element, or a list of such elements. In the latter
case, the values are concatenated as if they were passed to
pandoc.layout.concat
. If the value does not depend on
the input, a constant can be used as well.
The tables Writer.Block
and
Writer.Inline
can be used as functions; they apply
the right render function for an element of the respective type.
E.g., Writer.Block(pandoc.Para 'x')
will delegate to
the Writer.Para
render function and will return the
result of that call.
Similarly, the functions Writer.Blocks
and
Writer.Inlines
can be used to render lists of
elements, and Writer.Pandoc
renders the document’s
blocks. The function Writer.Blocks
can take a
separator as an optional second argument, e.g.,
Writer.Blocks(blks, pandoc.layout.cr)
; the default
block separator is pandoc.layout.blankline
.
All predefined functions can be overwritten when needed.
The resulting Writer uses the render functions to handle metadata values and converts them to template variables. The template is applied automatically if one is given.
Classic style
A writer using the classic style defines rendering functions for each element of the pandoc AST. Note that this style is deprecated and may be removed in later versions.
For example,
function Para(s)
return "<paragraph>" .. s .. "</paragraph>"
end
Template variables
New template variables can be added, or existing ones modified,
by returning a second value from function Doc
.
For example, the following will add the current date in
variable date
, unless date
is already
defined as either a metadata value or a variable:
function Doc (body, meta, vars)
vars.date = vars.date or meta.data or os.date '%B %e, %Y'
return body, vars
end
Changes in pandoc 3.0
Custom writers were reworked in pandoc 3.0. For technical
reasons, the global variables PANDOC_DOCUMENT
and
PANDOC_WRITER_OPTIONS
are set to the empty document
and default values, respectively. The old behavior can be restored
by adding the following snippet, which turns a classic into a new
style writer.
function Writer (doc, opts)
PANDOC_DOCUMENT = doc
PANDOC_WRITER_OPTIONS = opts
loadfile(PANDOC_SCRIPT_FILE)()
return pandoc.write_classic(doc, opts)
end