
1 changed files with 130 additions and 0 deletions
@ -0,0 +1,130 @@ |
|||||
|
# Document-Wide Post Processing |
||||
|
|
||||
|
An overview of how to tweak and augment the token stream just before rendering. |
||||
|
|
||||
|
## Goal |
||||
|
|
||||
|
The output document will be surrounded by `<section>` tags. Second-level headings (`h2`) will also trigger section breaks (i.e. `</section><section>`) immediately preceding the heading. |
||||
|
|
||||
|
## Core Rules |
||||
|
|
||||
|
The top-level rule pipeline turning raw Markdown into a token array consists of **core rules**. |
||||
|
The *block* and *inline* rule pipelines are run within a single "wrapper" rule in the core pipeline. |
||||
|
The wrapper rules appear relatively early in the [core pipeline](https://github.com/markdown-it/markdown-it/blob/0fe7ccb4b7f30236fb05f623be6924961d296d3d/lib/parser_core.mjs#L19). |
||||
|
|
||||
|
```javascript |
||||
|
const _rules = [ |
||||
|
['normalize', r_normalize], |
||||
|
['block', r_block], |
||||
|
['inline', r_inline], |
||||
|
['linkify', r_linkify], |
||||
|
['replacements', r_replacements], |
||||
|
['smartquotes', r_smartquotes], |
||||
|
['text_join', r_text_join] |
||||
|
] |
||||
|
``` |
||||
|
|
||||
|
Core rules typically do *not* scan through the source text or interpret Markdown syntax. |
||||
|
Rather, they usually modify or augment the token stream after an initial pass over the Markdown is complete. |
||||
|
|
||||
|
> [!NOTE] |
||||
|
> The `normalize` rule is an exception. |
||||
|
> It modifies the raw markdown (`state.src`), |
||||
|
> *normalizing* (as the name implies) idiosyncrasies like platform-specific newlines and null characters. |
||||
|
|
||||
|
Core rules can do much more, |
||||
|
but "post-processing" tasks are the most common use case. |
||||
|
|
||||
|
## Entry Point |
||||
|
|
||||
|
The new rule will be called `sectionize`. |
||||
|
The plugin entry point will look like the following: |
||||
|
|
||||
|
```typescript |
||||
|
export default function sectionize_plugin(md: MarkdownIt) { |
||||
|
md.core.ruler.push("sectionize", sectionize) |
||||
|
} |
||||
|
|
||||
|
function sectionize(state: StateCore) { |
||||
|
return |
||||
|
} |
||||
|
``` |
||||
|
|
||||
|
The new rule is pushed to the very end of the core pipeline. |
||||
|
While there are valid reasons to insert plugin rules elsewhere in the pipeline, |
||||
|
pushing to the end is a good default choice. |
||||
|
|
||||
|
> [!IMPORTANT] |
||||
|
> When in doubt, always put plugin rules at the end of the pipeline. |
||||
|
> This strategy minimizes the potential of breaking other rules' assumptions about state. |
||||
|
|
||||
|
In this case specifically, surrounding the document with `<section>` tags will **increase the nesting level** of every other token in the document. |
||||
|
Certain rules might iterate over the token stream and keep a running-total nesting level, |
||||
|
making assumptions about nesting level zero (for example). |
||||
|
Placing the new rule at the very end keeps it from affecting those other rules. |
||||
|
|
||||
|
## Section Insertion Logic |
||||
|
|
||||
|
Because we will be inserting tokens into the token array, |
||||
|
we will iterate *backwards* over the existing array so that our index pointer isn't affected by the insertions. |
||||
|
|
||||
|
```typescript |
||||
|
function sectionize(state: StateCore) { |
||||
|
const slugs: Record<string, boolean> = {} |
||||
|
const toProcess: Array<{ slug: string; anchor: Token; target: Token }> = [] |
||||
|
|
||||
|
// Iterate backwards since we're splicing elements into the array |
||||
|
for (let i = state.tokens.length - 1; i >= 0; i--) { |
||||
|
const token = state.tokens[i] |
||||
|
|
||||
|
if (token.type === "heading_open" && token.tag === "h2") { |
||||
|
const { open, close } = getSectionPair(state) |
||||
|
state.tokens.splice(i, 0, close, open) |
||||
|
} |
||||
|
} |
||||
|
|
||||
|
// ...The plugin isn't quite done yet |
||||
|
} |
||||
|
|
||||
|
function getSectionPair(state: StateCore) { |
||||
|
const open = new state.Token("section_open", "section", 1) |
||||
|
open.block = true |
||||
|
const close = new state.Token("section_close", "section", -1) |
||||
|
close.block = true |
||||
|
|
||||
|
return { open, close } |
||||
|
} |
||||
|
``` |
||||
|
|
||||
|
At this point, the tokens array now has a `</section><section>` pair immediately preceding each `<h2>`. |
||||
|
However, the document itself is not yet wrapped in an overarching section. |
||||
|
|
||||
|
There are two cases to consider: |
||||
|
|
||||
|
- The document originally started with a `h2`, so it now starts with `</section>` |
||||
|
- The document did not start with a `h2` |
||||
|
|
||||
|
Both cases are addressed with just a few lines of code: |
||||
|
|
||||
|
```typescript |
||||
|
function sectionize(state: StateCore) { |
||||
|
// ...iteration logic from above |
||||
|
|
||||
|
if (state.tokens[0].type === "section_close") { |
||||
|
state.tokens.push(state.tokens.shift()!) |
||||
|
} else { |
||||
|
const { open, close } = getSectionPair(state) |
||||
|
state.tokens.unshift(open) |
||||
|
state.tokens.push(close) |
||||
|
} |
||||
|
} |
||||
|
``` |
||||
|
|
||||
|
## Conclusion |
||||
|
|
||||
|
That's right: simple augmentation tasks like sectionization are straightforward to implement with core rule plugins. |
||||
|
No traversal of `state.src` is required, |
||||
|
because this rule is running *after* all of the block and inline rule sets. |
||||
|
|
||||
|
With a careful selection of rule positioning (defaulting to the end of the pipeline when in doubt), |
||||
|
post-processing rules are some of the simplest to write. |
Loading…
Reference in new issue