Browse Source

Update CommonMark spec to 0.20

pull/124/head
Alex Kocharin 10 years ago
parent
commit
7b961ee1ef
  1. 1198
      test/fixtures/commonmark/good.txt
  2. 193
      test/fixtures/commonmark/spec.txt

1198
test/fixtures/commonmark/good.txt

File diff suppressed because it is too large

193
test/fixtures/commonmark/spec.txt

@ -1,8 +1,8 @@
--- ---
title: CommonMark Spec title: CommonMark Spec
author: John MacFarlane author: John MacFarlane
version: 0.19 version: 0.20
date: 2015-04-27 date: 2015-06-08
license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
... ...
@ -212,12 +212,8 @@ to a certain encoding.
A [line](@line) is a sequence of zero or more [character]s A [line](@line) is a sequence of zero or more [character]s
followed by a [line ending] or by the end of file. followed by a [line ending] or by the end of file.
A [line ending](@line-ending) is, depending on the platform, a A [line ending](@line-ending) is a newline (`U+000A`), carriage return
newline (`U+000A`), carriage return (`U+000D`), or (`U+000D`), or carriage return + newline.
carriage return + newline.
For security reasons, a conforming parser must strip or replace the
Unicode character `U+0000`.
A line containing no characters, or a line containing only spaces A line containing no characters, or a line containing only spaces
(`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line). (`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line).
@ -239,7 +235,10 @@ carriage return (`U+000D`), newline (`U+000A`), or form feed
[Unicode whitespace](@unicode-whitespace) is a sequence of one [Unicode whitespace](@unicode-whitespace) is a sequence of one
or more [unicode whitespace character]s. or more [unicode whitespace character]s.
A [non-space character](@non-space-character) is anything but `U+0020`. A [space](@space) is `U+0020`.
A [non-space character](@non-space-character) is any character
that is not a [whitespace character].
An [ASCII punctuation character](@ascii-punctuation-character) An [ASCII punctuation character](@ascii-punctuation-character)
is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,
@ -250,9 +249,10 @@ A [punctuation character](@punctuation-character) is an [ASCII
punctuation character] or anything in punctuation character] or anything in
the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.
## Tab expansion ## Preprocessing
Tabs in lines are expanded to spaces, with a tab stop of 4 characters: Tabs in lines are immediately expanded to [spaces][space], with a tab
stop of 4 characters:
. .
→foo→baz→→bim →foo→baz→→bim
@ -270,14 +270,19 @@ Tabs in lines are expanded to spaces, with a tab stop of 4 characters:
</code></pre> </code></pre>
. .
## Insecure characters
For security reasons, the Unicode character `U+0000` must be replaced
with the replacement character (`U+FFFD`).
# Blocks and inlines # Blocks and inlines
We can think of a document as a sequence of We can think of a document as a sequence of
[blocks](@block)---structural [blocks](@block)---structural elements like paragraphs, block
elements like paragraphs, block quotations, quotations, lists, headers, rules, and code blocks. Some blocks (like
lists, headers, rules, and code blocks. Blocks can contain other block quotes and list items) contain other blocks; others (like
blocks, or they can contain [inline](@inline) content: headers and paragraphs) contain [inline](@inline) content---text,
words, spaces, links, emphasized text, images, and inline code. links, emphasized text, images, code, and so on.
## Precedence ## Precedence
@ -528,12 +533,12 @@ consists of a string of characters, parsed as inline content, between an
opening sequence of 1--6 unescaped `#` characters and an optional opening sequence of 1--6 unescaped `#` characters and an optional
closing sequence of any number of `#` characters. The opening sequence closing sequence of any number of `#` characters. The opening sequence
of `#` characters cannot be followed directly by a of `#` characters cannot be followed directly by a
[non-space character]. [non-space character]. The optional closing sequence of `#`s must be
The optional closing sequence of `#`s must be preceded by a space and may be preceded by a [space] and may be followed by spaces only. The opening
followed by spaces only. The opening `#` character may be indented 0-3 `#` character may be indented 0-3 spaces. The raw contents of the
spaces. The raw contents of the header are stripped of leading and header are stripped of leading and trailing spaces before being parsed
trailing spaces before being parsed as inline content. The header level as inline content. The header level is equal to the number of `#`
is equal to the number of `#` characters in the opening sequence. characters in the opening sequence.
Simple headers: Simple headers:
@ -561,16 +566,21 @@ More than six `#` characters is not a header:
<p>####### foo</p> <p>####### foo</p>
. .
A space is required between the `#` characters and the header's At least one space is required between the `#` characters and the
contents. Note that many implementations currently do not require header's contents, unless the header is empty. Note that many
the space. However, the space was required by the [original ATX implementations currently do not require the space. However, the
implementation](http://www.aaronsw.com/2002/atx/atx.py), and it helps space was required by the
prevent things like the following from being parsed as headers: [original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py),
and it helps prevent things like the following from being parsed as
headers:
. .
#5 bolt #5 bolt
#foobar
. .
<p>#5 bolt</p> <p>#5 bolt</p>
<p>#foobar</p>
. .
This is not a header, because the first `#` is escaped: This is not a header, because the first `#` is escaped:
@ -1024,7 +1034,41 @@ paragraph.)
</code></pre> </code></pre>
. .
The contents are literal text, and do not get parsed as Markdown: If there is any ambiguity between an interpretation of indentation
as a code block and as indicating that material belongs to a [list
item][list items], the list item interpretation takes precedence:
.
- foo
bar
.
<ul>
<li>
<p>foo</p>
<p>bar</p>
</li>
</ul>
.
.
1. foo
- bar
.
<ol>
<li>
<p>foo</p>
<ul>
<li>bar</li>
</ul>
</li>
</ol>
.
The contents of a code block are literal text, and do not get parsed
as Markdown:
. .
<a/> <a/>
@ -2325,9 +2369,16 @@ foo</p>
</blockquote> </blockquote>
. .
Laziness only applies to lines that are continuations of Laziness only applies to lines that would have been continuations of
paragraphs. Lines containing characters or indentation that indicate paragraphs had they been prepended with `>`. For example, the
block structure cannot be lazy. `>` cannot be omitted in the second line of
``` markdown
> foo
> ---
```
without changing the meaning:
. .
> foo > foo
@ -2339,6 +2390,15 @@ block structure cannot be lazy.
<hr /> <hr />
. .
Similarly, if we omit the `>` in the second line of
``` markdown
> - foo
> - bar
```
then the block quote ends after the first line:
. .
> - foo > - foo
- bar - bar
@ -2353,6 +2413,9 @@ block structure cannot be lazy.
</ul> </ul>
. .
For the same reason, we can't omit the `>` in front of
subsequent lines of an indented or fenced code block:
. .
> foo > foo
bar bar
@ -3837,7 +3900,9 @@ item:
- d - d
- e - e
- f - f
- g - g
- h
- i
. .
<ul> <ul>
<li>a</li> <li>a</li>
@ -3847,9 +3912,31 @@ item:
<li>e</li> <li>e</li>
<li>f</li> <li>f</li>
<li>g</li> <li>g</li>
<li>h</li>
<li>i</li>
</ul> </ul>
. .
.
1. a
2. b
3. c
.
<ol>
<li>
<p>a</p>
</li>
<li>
<p>b</p>
</li>
<li>
<p>c</p>
</li>
</ol>
.
This is a loose list, because there is a blank line between This is a loose list, because there is a blank line between
two of the list items: two of the list items:
@ -4277,13 +4364,14 @@ corresponding codepoints.
[Decimal entities](@decimal-entities) [Decimal entities](@decimal-entities)
consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
entities need to be recognised and transformed into their corresponding entities need to be recognised and transformed into their corresponding
unicode codepoints. Invalid unicode codepoints will be written as the unicode codepoints. Invalid unicode codepoints will be replaced by
"unknown codepoint" character (`0xFFFD`) the "unknown codepoint" character (`U+FFFD`). For security reasons,
the codepoint `U+0000` will also be replaced by `U+FFFD`.
. .
&#35; &#1234; &#992; &#98765432; &#35; &#1234; &#992; &#98765432; &#0;
. .
<p># Ӓ Ϡ �</p> <p># Ӓ Ϡ �</p>
. .
[Hexadecimal entities](@hexadecimal-entities) [Hexadecimal entities](@hexadecimal-entities)
@ -5063,9 +5151,9 @@ both left- and right-flanking, because it is preceded by
punctuation: punctuation:
. .
foo-_(bar)_ foo-__(bar)__
. .
<p>foo-<em>(bar)</em></p> <p>foo-<strong>(bar)</strong></p>
. .
@ -5177,9 +5265,9 @@ both left- and right-flanking, because it is followed by
punctuation: punctuation:
. .
_(bar)_. __(bar)__.
. .
<p><em>(bar)</em>.</p> <p><strong>(bar)</strong>.</p>
. .
Rule 9: Rule 9:
@ -6086,6 +6174,7 @@ that [matches] a [link reference definition] elsewhere in the document.
A [link label](@link-label) begins with a left bracket (`[`) and ends A [link label](@link-label) begins with a left bracket (`[`) and ends
with the first right bracket (`]`) that is not backslash-escaped. with the first right bracket (`]`) that is not backslash-escaped.
Between these brackets there must be at least one non-[whitespace character].
Unescaped square bracket characters are not allowed in Unescaped square bracket characters are not allowed in
[link label]s. A link label can have at most 999 [link label]s. A link label can have at most 999
characters inside the square brackets. characters inside the square brackets.
@ -6332,6 +6421,30 @@ backslash-escaped:
<p><a href="/uri">foo</a></p> <p><a href="/uri">foo</a></p>
. .
A [link label] must contain at least one non-[whitespace character]:
.
[]
[]: /uri
.
<p>[]</p>
<p>[]: /uri</p>
.
.
[
]
[
]: /uri
.
<p>[
]</p>
<p>[
]: /uri</p>
.
A [collapsed reference link](@collapsed-reference-link) A [collapsed reference link](@collapsed-reference-link)
consists of a [link label] that [matches] a consists of a [link label] that [matches] a
[link reference definition] elsewhere in the [link reference definition] elsewhere in the

Loading…
Cancel
Save