MDOC(3) Library Functions Manual MDOC(3)
NAME
mdoc, mdoc_alloc, mdoc_endparse, mdoc_free, mdoc_meta, mdoc_node, mdoc_parseln, mdoc_resetmdoc macro compiler library
SYNOPSIS
#include <mandoc.h>
#include <mdoc.h>
extern const char * const * mdoc_macronames;
extern const char * const * mdoc_argnames;
struct mdoc *
mdoc_alloc(struct regset *regs, void *data, int pflags, mandocmsg msgs);
int
mdoc_endparse(struct mdoc *mdoc);
void
mdoc_free(struct mdoc *mdoc);
const struct mdoc_meta *
mdoc_meta(const struct mdoc *mdoc);
const struct mdoc_node *
mdoc_node(const struct mdoc *mdoc);
int
mdoc_parseln(struct mdoc *mdoc, int line, char *buf);
int
mdoc_reset(struct mdoc *mdoc);
DESCRIPTION
The mdoc library parses lines of mdoc(7) input into an abstract syntax tree (AST).
 
In general, applications initiate a parsing sequence with mdoc_alloc(), parse each line in a document with mdoc_parseln(), close the parsing session with mdoc_endparse(), operate over the syntax tree returned by mdoc_node() and mdoc_meta(), then free all allocated memory with mdoc_free(). The mdoc_reset() function may be used in order to reset the parser for another input sequence. See the EXAMPLES section for a simple example.
 
This section further defines the Types, Functions and Variables available to programmers. Following that, the Abstract Syntax Tree section documents the output tree.
Types
Both functions (see Functions) and variables (see Variables) may use the following types:
struct mdoc
An opaque type defined in mdoc.c. Its values are only used privately within the library.
struct mdoc_node
A parsed node. Defined in mdoc.h. See Abstract Syntax Tree for details.
mandocmsg
A function callback type defined in mandoc.h.
Functions
Function descriptions follow:
mdoc_alloc()
Allocates a parsing structure. The data pointer is passed to msgs. The pflags arguments are defined in mdoc.h. Returns NULL on failure. If non-NULL, the pointer must be freed with mdoc_free().
mdoc_reset()
Reset the parser for another parse routine. After its use, mdoc_parseln() behaves as if invoked for the first time. If it returns 0, memory could not be allocated.
mdoc_free()
Free all resources of a parser. The pointer is no longer valid after invocation.
mdoc_parseln()
Parse a nil-terminated line of input. This line should not contain the trailing newline. Returns 0 on failure, 1 on success. The input buffer buf is modified by this function.
mdoc_endparse()
Signals that the parse is complete. Note that if mdoc_endparse() is called subsequent to mdoc_node(), the resulting tree is incomplete. Returns 0 on failure, 1 on success.
mdoc_node()
Returns the first node of the parse. Note that if mdoc_parseln() or mdoc_endparse() return 0, the tree will be incomplete.
mdoc_meta()
Returns the document's parsed meta-data. If this information has not yet been supplied or mdoc_parseln() or mdoc_endparse() return 0, the data will be incomplete.
Variables
The following variables are also defined:
mdoc_macronames
An array of string-ified token names.
mdoc_argnames
An array of string-ified token argument names.
Abstract Syntax Tree
The mdoc functions produce an abstract syntax tree (AST) describing input in a regular form. It may be reviewed at any time with mdoc_nodes(); however, if called before mdoc_endparse(), or after mdoc_endparse() or mdoc_parseln() fail, it may be incomplete.
 
This AST is governed by the ontological rules dictated in mdoc(7) and derives its terminology accordingly. “In-line” elements described in mdoc(7) are described simply as “elements”.
 
The AST is composed of struct mdoc_node nodes with block, head, body, element, root and text types as declared by the type field. Each node also provides its parse point (the line, sec, and pos fields), its position in the tree (the parent, child, nchild, next and prev fields) and some type-specific data, in particular, for nodes generated from macros, the generating macro in the tok field.
 
The tree itself is arranged according to the following normal form, where capitalised non-terminals represent nodes.
 
ROOT
← mnode+
mnode
← BLOCK | ELEMENT | TEXT
BLOCK
← HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
ELEMENT
← TEXT*
HEAD
← mnode*
BODY
← mnode* [ENDBODY mnode*]
TAIL
← mnode*
TEXT
← [[:printable:],0x1e]*
 
Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of the BLOCK production: these refer to punctuation marks. Furthermore, although a TEXT node will generally have a non-zero-length string, in the specific case of ‘.Bd -literal', an empty line will produce a zero-length string. Multiple body parts are only found in invocations of ‘Bl -column', where a new body introduces a new phrase.
Badly-nested Blocks
The ENDBODY node is available to end the formatting associated with a given block before the physical end of that block. It has a non-null end field, is of the BODY type, has the same tok as the BLOCK it is ending, and has a pending field pointing to that BLOCK's BODY node. It is an indirect child of that BODY node and has no children of its own.
 
An ENDBODY node is generated when a block ends while one of its child blocks is still open, like in the following example:
.Ao ao .Bo bo ac .Ac bc .Bc end
 
This example results in the following block structure:
BLOCK Ao HEAD Ao BODY Ao TEXT ao BLOCK Bo, pending -> Ao HEAD Bo BODY Bo TEXT bo TEXT ac ENDBODY Ao, pending -> Ao TEXT bc TEXT end
 
Here, the formatting of the ‘Ao' block extends from TEXT ao to TEXT ac, while the formatting of the ‘Bo' block extends from TEXT bo to TEXT bc. It renders as follows in -Tascii mode:
 
<ao [bo ac> bc] end
 
Support for badly-nested blocks is only provided for backward compatibility with some older mdoc(7) implementations. Using badly-nested blocks is strongly discouraged: the -Thtml and -Txhtml front-ends are unable to render them in any meaningful way. Furthermore, behaviour when encountering badly-nested blocks is not consistent across troff implementations, especially when using multiple levels of badly-nested blocks.
EXAMPLES
The following example reads lines from stdin and parses them, operating on the finished parse tree with parsed(). This example does not error-check nor free memory upon failure.
struct regset regs; struct mdoc *mdoc; const struct mdoc_node *node; char *buf; size_t len; int line; bzero(&regs, sizeof(struct regset)); line = 1; mdoc = mdoc_alloc(&regs, NULL, 0, NULL); buf = NULL; alloc_len = 0; while ((len = getline(&buf, &alloc_len, stdin)) >= 0) { if (len && buflen[len - 1] = '\n') buf[len - 1] = '\0'; if ( ! mdoc_parseln(mdoc, line, buf)) errx(1, "mdoc_parseln"); line++; } if ( ! mdoc_endparse(mdoc)) errx(1, "mdoc_endparse"); if (NULL == (node = mdoc_node(mdoc))) errx(1, "mdoc_node"); parsed(mdoc, node); mdoc_free(mdoc);
 
Please see main.c in the source archive for a rigorous reference.
SEE ALSO
AUTHORS
The mdoc library was written by Kristaps Dzonsons <kristaps@bsd.lv>.