markdown-it/docs/architecture.md

# markdown-it design principles

## Data flow

Input data is piped via nestesd chains of rules. There are 3 nested chains -
`core`, `block` & `inline`:

```
core
    core.rule1
    ... (normalize)

    block
        block.rule1
        ...
        block.ruleX

    core.ruleXX
    ... (nothing yet)

    inline (applyed to each block token with "inline" type)
        inline.rule1
        ...
        inline.ruleX

    core.ruleYY
    ... (abbreviation, footnote, typographer, linkifier)

```

Mutable data are:

- array of tokens
- `env` sandbox

Tokens are the "main" data, but some rules can be "splitted" to several chains,
and need sandbox for exchange. Also, `env` can be used to inject per-render
variables for your custom parse and render rules.

Each chain (core / block / inline) has independent `state` object, to isolate
data and protect code from clutter.


## Token stream

Instead of traditional AST we use more low-level data representation - tokens.
Difference is simple:

- Tokens are sequence (Array).
- Opening and closing tags are separate tokens.
- There are special token objects, "inline containers", having nested token
  sequences with inline markup (bold, italic, text, ...).

Each token has common fields:

- __type__ - token name.
- __level__ - nesting level, useful to seek closeing pair.
- __lines__ - [begin, end], for block tokens only. Range of input lines,
  compiled to this token.

Inline container (`type === "inline"`) has additional properties:

- __content__ - raw text, unparsed inline content.
- __children__ - token stream for parsed content.

In total, token stream is:

- On the top level - array of paired or single "block" tokens:
  - open/close for headers, lists, blockquotes, paragraphs, ...
  - codes, fenced blocks, horisontal rules, html blocks, inlines containers
- Each inline containers have `.children` property with token stream for inline content:
  - open/close for strong, em, link, code, ...
  - text, line breaks

Why not AST? Because it's not needed for our tasks. We follow KISS principle.
If you whish - you can call parser without renderer and convert token stream
to AST.

Where to search more details about tokens:

- [Renderer source](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js)
- [Live demo](https://markdown-it.github.io/) - type your text ant click `debug` tab.


## Rules

Rules are functions, doing "magick" with parser `state` objects. Each rule is
registered in one of chain with unique name.

Rules are managed by names via [Ruler](https://markdown-it.github.io/markdown-it/#Ruler) instances and `enable` / `disable` methods in [MarkdownIt](https://markdown-it.github.io/markdown-it/#MarkdownIt).

You can note, that some rules have "validation mode" - in this mode rule does not
modify token stream, and only look ahead for the end of token. It's one of
important design principle - token stream is "write only" on block & inline parse stages.

Parser is designed to keep rules independent. You can safely disable any, or
add new one. There are no universal recipes how to create new rules - design of
distributed state machines with good data isolation is tricky business. But you
can investigate existing rules & plugins to see possible approaches.

Also, in complex cases you can try to ask for help in tracker. Condition is very
simple - it should be clear from your ticket, that you studied docs, sources,
and tried to do something yourself. We never reject with help to real developpers.


## Renderer

After token stream is generated, it's passed to [renderer](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js).
It just plays all tokens, passing each to rule with the same name as token type.

Renderer rules are located in `md.renderer.rules[name]` and are simple functions
with the same signature:

```js
function (tokens, idx, options, env, renderer) {
  //...
  return htmlResult;
}
```

In many cases that allows easy output change even without parser intrusion.
For example, let's replace images with vimeo links to player's iframe:

```js
var md = require('markdown-it')();

var defaultRender = md.renderer.rules.image,
    vimeoRE       = /^https?:\/\/(www\.)?vimeo.com\/(\d+)($|\/)/;

md.renderer.rules.image = function (tokens, idx, options, env, self) {
  var id;

  if (vimeoRE.test(tokens[idx].href)) {

    id = tokens[idx].href.match(vimeoRE)[2];

    return '<div class="embed-responsive embed-responsive-16by9">\n' +
           '  <iframe class="embed-responsive-item" src="//player.vimeo.com/video/' + id + '"></iframe>\n' +
           '</div>\n';
  }

  return defaultRender(tokens, idx, options, env, self);
});
```

You also can write your own renderer to generate AST for example.


## Summary

This was mentioned in [Data flow](#data-flow), but let's repeat sequence again:

1. Blocks are parsed, and top level of token stream filled with block tokens.
2. Content on inline containers is parsed, filling `.children` properties.
3. Rendering happens.

And somewhere between you can apply addtional transformations :) . Full content
of each chain can be seen on the top of
[parser_core.js](https://github.com/markdown-it/markdown-it/blob/master/lib/parser_core.js),
[parser_block.js](https://github.com/markdown-it/markdown-it/blob/master/lib/parser_block.js) and
[parser_inline.js](https://github.com/markdown-it/markdown-it/blob/master/lib/parser_inline.js)
files.

Also you can change output directly in [renderer](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js) for many simple cases.
Added preliminary development docs 10 years ago			`# markdown-it design principles`

			`## Data flow`

Docs clarification 10 years ago			`Input data is piped via nestesd chains of rules. There are 3 nested chains -`
			`core`, `block` & `inline`:
Added preliminary development docs 10 years ago
			```
			`core`
			`core.rule1`
docs update 10 years ago			`... (normalize)`
Added preliminary development docs 10 years ago
			`block`
			`block.rule1`
			`...`
			`block.ruleX`

			`core.ruleXX`
docs update 10 years ago			`... (nothing yet)`
Added preliminary development docs 10 years ago
docs update 10 years ago			`inline (applyed to each block token with "inline" type)`
Added preliminary development docs 10 years ago			`inline.rule1`
			`...`
			`inline.ruleX`

			`core.ruleYY`
docs update 10 years ago			`... (abbreviation, footnote, typographer, linkifier)`
Added preliminary development docs 10 years ago
			```

			`Mutable data are:`

			`- array of tokens`
			- `env` sandbox

			`Tokens are the "main" data, but some rules can be "splitted" to several chains,`
			and need sandbox for exchange. Also, `env` can be used to inject per-render
			`variables for your custom parse and render rules.`

			Each chain (core / block / inline) has independent `state` object, to isolate
			`data and protect code from clutter.`


			`## Token stream`

			`Instead of traditional AST we use more low-level data representation - tokens.`
Docs clarification 10 years ago			`Difference is simple:`
Added preliminary development docs 10 years ago
Docs clarification 10 years ago			`- Tokens are sequence (Array).`
			`- Opening and closing tags are separate tokens.`
docs update 10 years ago			`- There are special token objects, "inline containers", having nested token`
Docs clarification 10 years ago			`sequences with inline markup (bold, italic, text, ...).`
Added preliminary development docs 10 years ago
Docs clarification 10 years ago			`Each token has common fields:`
Added preliminary development docs 10 years ago
			`- __type__ - token name.`
Docs clarification 10 years ago			`- __level__ - nesting level, useful to seek closeing pair.`
Added preliminary development docs 10 years ago			`- __lines__ - [begin, end], for block tokens only. Range of input lines,`
Docs clarification 10 years ago			`compiled to this token.`
Added preliminary development docs 10 years ago
			Inline container (`type === "inline"`) has additional properties:

			`- __content__ - raw text, unparsed inline content.`
			`- __children__ - token stream for parsed content.`

			`In total, token stream is:`

Docs clarification 10 years ago			`- On the top level - array of paired or single "block" tokens:`
			`- open/close for headers, lists, blockquotes, paragraphs, ...`
Added preliminary development docs 10 years ago			`- codes, fenced blocks, horisontal rules, html blocks, inlines containers`
Docs clarification 10 years ago			- Each inline containers have `.children` property with token stream for inline content:
Added preliminary development docs 10 years ago			`- open/close for strong, em, link, code, ...`
			`- text, line breaks`

			`Why not AST? Because it's not needed for our tasks. We follow KISS principle.`
Docs clarification 10 years ago			`If you whish - you can call parser without renderer and convert token stream`
Added preliminary development docs 10 years ago			`to AST.`

Docs clarification 10 years ago			`Where to search more details about tokens:`

			`- [Renderer source](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js)`
			- [Live demo](https://markdown-it.github.io/) - type your text ant click `debug` tab.


docs update 10 years ago			`## Rules`

			Rules are functions, doing "magick" with parser `state` objects. Each rule is
			`registered in one of chain with unique name.`

			Rules are managed by names via [Ruler](https://markdown-it.github.io/markdown-it/#Ruler) instances and `enable` / `disable` methods in [MarkdownIt](https://markdown-it.github.io/markdown-it/#MarkdownIt).

			`You can note, that some rules have "validation mode" - in this mode rule does not`
docs update 10 years ago			`modify token stream, and only look ahead for the end of token. It's one of`
			`important design principle - token stream is "write only" on block & inline parse stages.`
docs update 10 years ago
			`Parser is designed to keep rules independent. You can safely disable any, or`
			`add new one. There are no universal recipes how to create new rules - design of`
			`distributed state machines with good data isolation is tricky business. But you`
			`can investigate existing rules & plugins to see possible approaches.`

			`Also, in complex cases you can try to ask for help in tracker. Condition is very`
			`simple - it should be clear from your ticket, that you studied docs, sources,`
docs update 10 years ago			`and tried to do something yourself. We never reject with help to real developpers.`
docs update 10 years ago

Added renderer doc and example 10 years ago			`## Renderer`

			`After token stream is generated, it's passed to [renderer](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js).`
			`It just plays all tokens, passing each to rule with the same name as token type.`

docs update 10 years ago			Renderer rules are located in `md.renderer.rules[name]` and are simple functions
Added renderer doc and example 10 years ago			`with the same signature:`

			```js
docs update 10 years ago			`function (tokens, idx, options, env, renderer) {`
Added renderer doc and example 10 years ago			`//...`
			`return htmlResult;`
			`}`
			```

			`In many cases that allows easy output change even without parser intrusion.`
			`For example, let's replace images with vimeo links to player's iframe:`

			```js
			`var md = require('markdown-it')();`

typo fixes 10 years ago			`var defaultRender = md.renderer.rules.image,`
Added renderer doc and example 10 years ago			`vimeoRE = /^https?:\/\/(www\.)?vimeo.com\/(\d+)($\|\/)/;`

typo fixes 10 years ago			`md.renderer.rules.image = function (tokens, idx, options, env, self) {`
Added renderer doc and example 10 years ago			`var id;`

			`if (vimeoRE.test(tokens[idx].href)) {`

			`id = tokens[idx].href.match(vimeoRE)[2];`

			`return '<div class="embed-responsive embed-responsive-16by9">\n' +`
			`' <iframe class="embed-responsive-item" src="//player.vimeo.com/video/' + id + '"></iframe>\n' +`
			`'</div>\n';`
			`}`

			`return defaultRender(tokens, idx, options, env, self);`
			`});`
			```

			`You also can write your own renderer to generate AST for example.`


			`## Summary`
Added preliminary development docs 10 years ago
			`This was mentioned in [Data flow](#data-flow), but let's repeat sequence again:`

			`1. Blocks are parsed, and top level of token stream filled with block tokens.`
			2. Content on inline containers is parsed, filling `.children` properties.
			`3. Rendering happens.`

			`And somewhere between you can apply addtional transformations :) . Full content`
			`of each chain can be seen on the top of`
			`[parser_core.js](https://github.com/markdown-it/markdown-it/blob/master/lib/parser_core.js),`
			`[parser_block.js](https://github.com/markdown-it/markdown-it/blob/master/lib/parser_block.js) and`
			`[parser_inline.js](https://github.com/markdown-it/markdown-it/blob/master/lib/parser_inline.js)`
			`files.`
Added renderer doc and example 10 years ago
			`Also you can change output directly in [renderer](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js) for many simple cases.`