Update architecture.md

Just tried to rewrite some of the contents, fixing mistakes along the way.
10 years ago · e996fe7015
1 changed files with 40 additions and 43 deletions
--- a/docs/architecture.md
+++ b/docs/architecture.md
@ -2,81 +2,79 @@

 ## Data flow

-Input data is piped via nestesd chains of rules. There are 3 nested chains -
+Input data is parsed via nested chains of rules. There are 3 nested chains -
 `core`, `block` & `inline`:

 ```
 core
-    core.rule1
-    ... (normalize)
+    core.rule1 (normalize)
+    ...
+    core.ruleX

    block
-        block.rule1
+        block.rule1 (blockquote)
        ...
        block.ruleX

+    core.ruleX1 (intermediate rule that applies on block tokens, nothing yet)
+    ... 
    core.ruleXX
-    ... (nothing yet)

-    inline (applyed to each block token with "inline" type)
-        inline.rule1
+    inline (applied to each block token with "inline" type)
+        inline.rule1 (text)
        ...
        inline.ruleX

-    core.ruleYY
+    core.ruleYY (applies to all tokens)
    ... (abbreviation, footnote, typographer, linkifier)

 ```

-Mutable data are:
+The result of the parsing is a *list of tokens*, that will be passed to the `renderer` to generate the html content.

- array of tokens
- `env` sandbox
+These tokens can be themselves parsed again to generate more tokens (ex: a `list token` can be divided into multiple `inline tokens`).

-Tokens are the "main" data, but some rules can be "splitted" to several chains,
-and need sandbox for exchange. Also, `env` can be used to inject per-render
-variables for your custom parse and render rules.
+An `env` sandbox can be used alongside tokens to inject external variables for your parsers and renderers.

-Each chain (core / block / inline) has independent `state` object, to isolate
-data and protect code from clutter.
+Each chain (core / block / inline) uses an independent `state` object when parsing data, so that each parsing operation is independent and can be disabled on the fly.


 ## Token stream

-Instead of traditional AST we use more low-level data representation - tokens.
-Difference is simple:
+Instead of traditional AST we use more low-level data representation - *tokens*.
+The difference is simple:

- Tokens are sequence (Array).
+- Tokens are a simple sequence (Array).
 - Opening and closing tags are separate tokens.
- There are special token objects, "inline containers", having nested token
+- There are special token objects, "inline containers", having nested tokens.
  sequences with inline markup (bold, italic, text, ...).

-Each token has common fields:
+Each token has some common fields:

 - __type__ - token name.
- __level__ - nesting level, useful to seek closeing pair.
+- __level__ - nesting level, useful to seek closing pair.
 - __lines__ - [begin, end], for block tokens only. Range of input lines,
  compiled to this token.

-Inline container (`type === "inline"`) has additional properties:
+Inline tokens (`type === "inline"`) have additional properties:

 - __content__ - raw text, unparsed inline content.
- __children__ - token stream for parsed content.
+- __children__ - nested token stream.

-In total, token stream is:
+In total, a token stream is:

 - On the top level - array of paired or single "block" tokens:
  - open/close for headers, lists, blockquotes, paragraphs, ...
-  - codes, fenced blocks, horisontal rules, html blocks, inlines containers
- Each inline containers have `.children` property with token stream for inline content:
+  - codes, fenced blocks, horizontal rules, html blocks, inlines containers
+- Each inline token have a `.children` property with a nested token stream for inline content:
  - open/close for strong, em, link, code, ...
  - text, line breaks

 Why not AST? Because it's not needed for our tasks. We follow KISS principle.
-If you whish - you can call parser without renderer and convert token stream
-to AST.
+If you wish - you can call a parser without a renderer and convert the token stream
+to an AST.

-Where to search more details about tokens:
+More details about tokens:

 - [Renderer source](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js)
 - [Live demo](https://markdown-it.github.io/) - type your text ant click `debug` tab.
@ -84,18 +82,17 @@ Where to search more details about tokens:

 ## Rules

-Rules are functions, doing "magick" with parser `state` objects. Each rule is
-registered in one of chain with unique name.
+Rules are functions, doing "magic" with parser `state` objects. A rule is associated with one or more *chains* and is unique. For instance, a `blockquote` token is associated with `blockquote`, `paragraph`, `heading` and `list` chains.

-Rules are managed by names via [Ruler](https://markdown-it.github.io/markdown-it/#Ruler) instances and `enable` / `disable` methods in [MarkdownIt](https://markdown-it.github.io/markdown-it/#MarkdownIt).
+Rules are managed by names via [Ruler](https://markdown-it.github.io/markdown-it/#Ruler) instances and can be  `enabled` / `disabled` from the [MarkdownIt](https://markdown-it.github.io/markdown-it/#MarkdownIt) methods.

-You can note, that some rules have "validation mode" - in this mode rule does not
-modify token stream, and only look ahead for the end of token. It's one of
-important design principle - token stream is "write only" on block & inline parse stages.
+You can note, that some rules have a `validation mode` - in this mode rules do not
+modify the token stream, and only look ahead for the end of a token. It's one
+important design principle - a token stream is "write only" on block & inline parse stages.

-Parser is designed to keep rules independent. You can safely disable any, or
-add new one. There are no universal recipes how to create new rules - design of
-distributed state machines with good data isolation is tricky business. But you
+Parsers are designed to keep rules independent of each other. You can safely enable/disable them, or
+add new ones. There are no universal recipes for how to create new rules - design of
+distributed state machines with good data isolation is a tricky business. But you
 can investigate existing rules & plugins to see possible approaches.

 Also, in complex cases you can try to ask for help in tracker. Condition is very
@ -105,8 +102,8 @@ and tried to do something yourself. We never reject with help to real developper

 ## Renderer

-After token stream is generated, it's passed to [renderer](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js).
-It just plays all tokens, passing each to rule with the same name as token type.
+After token stream is generated, it's passed to a [renderer](https://github.com/markdown-it/markdown-it/blob/master/lib/renderer.js).
+It then plays all the tokens, passing each to a rule with the same name as token type.

 Renderer rules are located in `md.renderer.rules[name]` and are simple functions
 with the same signature:
@ -143,7 +140,7 @@ md.renderer.rules.image = function (tokens, idx, options, env, self) {
 });
 ```

-You also can write your own renderer to generate AST for example.
+You also can write your own renderer to generate other formats than HTML, such as JSON/XML... You can even use it to generate AST.


 ## Summary
@ -154,7 +151,7 @@ This was mentioned in [Data flow](#data-flow), but let's repeat sequence again:
 2. Content on inline containers is parsed, filling `.children` properties.
 3. Rendering happens.

-And somewhere between you can apply addtional transformations :) . Full content
+And somewhere between you can apply additional transformations :) . Full content
 of each chain can be seen on the top of
 [parser_core.js](https://github.com/markdown-it/markdown-it/blob/master/lib/parser_core.js),
 [parser_block.js](https://github.com/markdown-it/markdown-it/blob/master/lib/parser_block.js) and