Browse Source

Update architecture.md

Just tried to rewrite some of the contents, fixing mistakes along the way.
pull/50/head
Elior Boukhobza 10 years ago
parent
commit
e996fe7015
  1. 83
      docs/architecture.md

83
docs/architecture.md

@ -2,81 +2,79 @@
## Data flow ## Data flow
Input data is piped via nestesd chains of rules. There are 3 nested chains - Input data is parsed via nested chains of rules. There are 3 nested chains -
`core`, `block` & `inline`: `core`, `block` & `inline`:
``` ```
core core
core.rule1 core.rule1 (normalize)
... (normalize) ...
core.ruleX
block block
block.rule1 block.rule1 (blockquote)
... ...
block.ruleX block.ruleX
core.ruleX1 (intermediate rule that applies on block tokens, nothing yet)
...
core.ruleXX core.ruleXX
... (nothing yet)
inline (applyed to each block token with "inline" type) inline (applied to each block token with "inline" type)
inline.rule1 inline.rule1 (text)
... ...
inline.ruleX inline.ruleX
core.ruleYY core.ruleYY (applies to all tokens)
... (abbreviation, footnote, typographer, linkifier) ... (abbreviation, footnote, typographer, linkifier)
``` ```
Mutable data are: The result of the parsing is a *list of tokens*, that will be passed to the `renderer` to generate the html content.
- array of tokens These tokens can be themselves parsed again to generate more tokens (ex: a `list token` can be divided into multiple `inline tokens`).
- `env` sandbox
Tokens are the "main" data, but some rules can be "splitted" to several chains, An `env` sandbox can be used alongside tokens to inject external variables for your parsers and renderers.
and need sandbox for exchange. Also, `env` can be used to inject per-render
variables for your custom parse and render rules.
Each chain (core / block / inline) has independent `state` object, to isolate Each chain (core / block / inline) uses an independent `state` object when parsing data, so that each parsing operation is independent and can be disabled on the fly.
data and protect code from clutter.
## Token stream ## Token stream
Instead of traditional AST we use more low-level data representation - tokens. Instead of traditional AST we use more low-level data representation - *tokens*.
Difference is simple: The difference is simple:
- Tokens are sequence (Array). - Tokens are a simple sequence (Array).
- Opening and closing tags are separate tokens. - Opening and closing tags are separate tokens.
- There are special token objects, "inline containers", having nested token - There are special token objects, "inline containers", having nested tokens.
sequences with inline markup (bold, italic, text, ...). sequences with inline markup (bold, italic, text, ...).
Each token has common fields: Each token has some common fields:
- __type__ - token name. - __type__ - token name.
- __level__ - nesting level, useful to seek closeing pair. - __level__ - nesting level, useful to seek closing pair.
- __lines__ - [begin, end], for block tokens only. Range of input lines, - __lines__ - [begin, end], for block tokens only. Range of input lines,
compiled to this token. compiled to this token.
Inline container (`type === "inline"`) has additional properties: Inline tokens (`type === "inline"`) have additional properties:
- __content__ - raw text, unparsed inline content. - __content__ - raw text, unparsed inline content.
- __children__ - token stream for parsed content. - __children__ - nested token stream.
In total, token stream is: In total, a token stream is:
- On the top level - array of paired or single "block" tokens: - On the top level - array of paired or single "block" tokens:
- open/close for headers, lists, blockquotes, paragraphs, ... - open/close for headers, lists, blockquotes, paragraphs, ...
- codes, fenced blocks, horisontal rules, html blocks, inlines containers - codes, fenced blocks, horizontal rules, html blocks, inlines containers
- Each inline containers have `.children` property with token stream for inline content: - Each inline token have a `.children` property with a nested token stream for inline content:
- open/close for strong, em, link, code, ... - open/close for strong, em, link, code, ...
- text, line breaks - text, line breaks
Why not AST? Because it's not needed for our tasks. We follow KISS principle. Why not AST? Because it's not needed for our tasks. We follow KISS principle.
If you whish - you can call parser without renderer and convert token stream If you wish - you can call a parser without a renderer and convert the token stream
to AST. to an AST.
Where to search more details about tokens: More details about tokens:
- [Renderer source](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js) - [Renderer source](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js)
- [Live demo](https://markdown-it.github.io/) - type your text ant click `debug` tab. - [Live demo](https://markdown-it.github.io/) - type your text ant click `debug` tab.
@ -84,18 +82,17 @@ Where to search more details about tokens:
## Rules ## Rules
Rules are functions, doing "magick" with parser `state` objects. Each rule is Rules are functions, doing "magic" with parser `state` objects. A rule is associated with one or more *chains* and is unique. For instance, a `blockquote` token is associated with `blockquote`, `paragraph`, `heading` and `list` chains.
registered in one of chain with unique name.
Rules are managed by names via [Ruler](https://markdown-it.github.io/markdown-it/#Ruler) instances and `enable` / `disable` methods in [MarkdownIt](https://markdown-it.github.io/markdown-it/#MarkdownIt). Rules are managed by names via [Ruler](https://markdown-it.github.io/markdown-it/#Ruler) instances and can be `enabled` / `disabled` from the [MarkdownIt](https://markdown-it.github.io/markdown-it/#MarkdownIt) methods.
You can note, that some rules have "validation mode" - in this mode rule does not You can note, that some rules have a `validation mode` - in this mode rules do not
modify token stream, and only look ahead for the end of token. It's one of modify the token stream, and only look ahead for the end of a token. It's one
important design principle - token stream is "write only" on block & inline parse stages. important design principle - a token stream is "write only" on block & inline parse stages.
Parser is designed to keep rules independent. You can safely disable any, or Parsers are designed to keep rules independent of each other. You can safely enable/disable them, or
add new one. There are no universal recipes how to create new rules - design of add new ones. There are no universal recipes for how to create new rules - design of
distributed state machines with good data isolation is tricky business. But you distributed state machines with good data isolation is a tricky business. But you
can investigate existing rules & plugins to see possible approaches. can investigate existing rules & plugins to see possible approaches.
Also, in complex cases you can try to ask for help in tracker. Condition is very Also, in complex cases you can try to ask for help in tracker. Condition is very
@ -105,8 +102,8 @@ and tried to do something yourself. We never reject with help to real developper
## Renderer ## Renderer
After token stream is generated, it's passed to [renderer](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js). After token stream is generated, it's passed to a [renderer](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js).
It just plays all tokens, passing each to rule with the same name as token type. It then plays all the tokens, passing each to a rule with the same name as token type.
Renderer rules are located in `md.renderer.rules[name]` and are simple functions Renderer rules are located in `md.renderer.rules[name]` and are simple functions
with the same signature: with the same signature:
@ -143,7 +140,7 @@ md.renderer.rules.image = function (tokens, idx, options, env, self) {
}); });
``` ```
You also can write your own renderer to generate AST for example. You also can write your own renderer to generate other formats than HTML, such as JSON/XML... You can even use it to generate AST.
## Summary ## Summary
@ -154,7 +151,7 @@ This was mentioned in [Data flow](#data-flow), but let's repeat sequence again:
2. Content on inline containers is parsed, filling `.children` properties. 2. Content on inline containers is parsed, filling `.children` properties.
3. Rendering happens. 3. Rendering happens.
And somewhere between you can apply addtional transformations :) . Full content And somewhere between you can apply additional transformations :) . Full content
of each chain can be seen on the top of of each chain can be seen on the top of
[parser_core.js](https://github.com/markdown-it/markdown-it/blob/master/lib/parser_core.js), [parser_core.js](https://github.com/markdown-it/markdown-it/blob/master/lib/parser_core.js),
[parser_block.js](https://github.com/markdown-it/markdown-it/blob/master/lib/parser_block.js) and [parser_block.js](https://github.com/markdown-it/markdown-it/blob/master/lib/parser_block.js) and

Loading…
Cancel
Save