The result of the parsing is a *list of tokens*, that will be passed to the `renderer` to generate the html content.
- array of tokens
- `env` sandbox
These tokens can be themselves parsed again to generate more tokens (ex: a `list token` can be divided into multiple `inline tokens`).
Tokens are the "main" data, but some rules can be "splitted" to several chains,
and need sandbox for exchange. Also, `env` can be used to inject per-render
variables for your custom parse and render rules.
An `env` sandbox can be used alongside tokens to inject external variables for your parsers and renderers.
Each chain (core / block / inline) has independent `state` object, to isolate
data and protect code from clutter.
Each chain (core / block / inline) uses an independent `state` object when parsing data, so that each parsing operation is independent and can be disabled on the fly.
## Token stream
Instead of traditional AST we use more low-level data representation - tokens.
Difference is simple:
Instead of traditional AST we use more low-level data representation - *tokens*.
The difference is simple:
- Tokens are sequence (Array).
- Tokens are a simple sequence (Array).
- Opening and closing tags are separate tokens.
- There are special token objects, "inline containers", having nested token
- There are special token objects, "inline containers", having nested tokens.
sequences with inline markup (bold, italic, text, ...).
Each token has common fields:
Each token has some common fields:
- __type__ - token name.
- __level__ - nesting level, useful to seek closeing pair.
- __level__ - nesting level, useful to seek closing pair.
- __lines__ - [begin, end], for block tokens only. Range of input lines,
compiled to this token.
Inline container (`type === "inline"`) has additional properties:
Inline tokens (`type === "inline"`) have additional properties:
- __content__ - raw text, unparsed inline content.
- __children__ - token stream for parsed content.
- __children__ - nested token stream.
In total, token stream is:
In total, a token stream is:
- On the top level - array of paired or single "block" tokens:
- open/close for headers, lists, blockquotes, paragraphs, ...
- codes, fenced blocks, horisontal rules, html blocks, inlines containers
- Each inline containers have`.children` property with token stream for inline content:
- codes, fenced blocks, horizontal rules, html blocks, inlines containers
- Each inline token have a`.children` property with a nested token stream for inline content:
- open/close for strong, em, link, code, ...
- text, line breaks
Why not AST? Because it's not needed for our tasks. We follow KISS principle.
If you whish - you can call parser without renderer and convert token stream
to AST.
If you wish - you can call a parser without a renderer and convert the token stream
- [Live demo](https://markdown-it.github.io/) - type your text ant click `debug` tab.
@ -84,18 +82,17 @@ Where to search more details about tokens:
## Rules
Rules are functions, doing "magick" with parser `state` objects. Each rule is
registered in one of chain with unique name.
Rules are functions, doing "magic" with parser `state` objects. A rule is associated with one or more *chains* and is unique. For instance, a `blockquote` token is associated with `blockquote`, `paragraph`, `heading` and `list` chains.
Rules are managed by names via [Ruler](https://markdown-it.github.io/markdown-it/#Ruler) instances and `enable` / `disable` methods in [MarkdownIt](https://markdown-it.github.io/markdown-it/#MarkdownIt).
Rules are managed by names via [Ruler](https://markdown-it.github.io/markdown-it/#Ruler) instances and can be `enabled` / `disabled` from the [MarkdownIt](https://markdown-it.github.io/markdown-it/#MarkdownIt) methods.
You can note, that some rules have "validation mode" - in this mode rule does not
modify token stream, and only look ahead for the end of token. It's one of
important design principle - token stream is "write only" on block & inline parse stages.
You can note, that some rules have a `validation mode` - in this mode rules do not
modify the token stream, and only look ahead for the end of a token. It's one
important design principle - a token stream is "write only" on block & inline parse stages.
Parser is designed to keep rules independent. You can safely disable any, or
add new one. There are no universal recipes how to create new rules - design of
distributed state machines with good data isolation is tricky business. But you
Parsers are designed to keep rules independent of each other. You can safely enable/disable them, or
add new ones. There are no universal recipes for how to create new rules - design of
distributed state machines with good data isolation is a tricky business. But you
can investigate existing rules & plugins to see possible approaches.
Also, in complex cases you can try to ask for help in tracker. Condition is very
@ -105,8 +102,8 @@ and tried to do something yourself. We never reject with help to real developper
## Renderer
After token stream is generated, it's passed to [renderer](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js).
It just plays all tokens, passing each to rule with the same name as token type.
After token stream is generated, it's passed to a [renderer](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js).
It then plays all the tokens, passing each to a rule with the same name as token type.
Renderer rules are located in `md.renderer.rules[name]` and are simple functions