Markdown parser, done right. 100% CommonMark support, extensions, syntax plugins & high speed
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

99 lines
3.0 KiB

# markdown-it design principles
## Data flow
Input data is piped via nestesd chains of rules. There are 3 nested chains -
`core`, `block` & `inline`:
... (none yet, you can patch input string here)
... (references, abbreviations, footnotes)
inline (applyed to each block token with "inline type")
... (typographer, linkifier)
Mutable data are:
- array of tokens
- `env` sandbox
Tokens are the "main" data, but some rules can be "splitted" to several chains,
and need sandbox for exchange. Also, `env` can be used to inject per-render
variables for your custom parse and render rules.
Each chain (core / block / inline) has independent `state` object, to isolate
data and protect code from clutter.
## Token stream
Instead of traditional AST we use more low-level data representation - tokens.
Difference is simple:
- Tokens are sequence (Array).
- Opening and closing tags are separate tokens.
- There are special token object, "inline containers", having nested token
sequences with inline markup (bold, italic, text, ...).
Each token has common fields:
- __type__ - token name.
- __level__ - nesting level, useful to seek closeing pair.
- __lines__ - [begin, end], for block tokens only. Range of input lines,
compiled to this token.
Inline container (`type === "inline"`) has additional properties:
- __content__ - raw text, unparsed inline content.
- __children__ - token stream for parsed content.
In total, token stream is:
- On the top level - array of paired or single "block" tokens:
- open/close for headers, lists, blockquotes, paragraphs, ...
- codes, fenced blocks, horisontal rules, html blocks, inlines containers
- Each inline containers have `.children` property with token stream for inline content:
- open/close for strong, em, link, code, ...
- text, line breaks
Why not AST? Because it's not needed for our tasks. We follow KISS principle.
If you whish - you can call parser without renderer and convert token stream
to AST.
Where to search more details about tokens:
- [Renderer source](
- [Live demo]( - type your text ant click `debug` tab.
## Parse process
This was mentioned in [Data flow](#data-flow), but let's repeat sequence again:
1. Blocks are parsed, and top level of token stream filled with block tokens.
2. Content on inline containers is parsed, filling `.children` properties.
3. Rendering happens.
And somewhere between you can apply addtional transformations :) . Full content
of each chain can be seen on the top of
[parser_block.js]( and