Browse Source

Update CommonMark spec to 0.19

pull/104/head
Alex Kocharin 10 years ago
parent
commit
7cd639ed39
  1. 1041
      test/fixtures/commonmark/good.txt
  2. 140
      test/fixtures/commonmark/spec.txt

1041
test/fixtures/commonmark/good.txt

File diff suppressed because it is too large

140
test/fixtures/commonmark/spec.txt

@ -1,8 +1,8 @@
--- ---
title: CommonMark Spec title: CommonMark Spec
author: John MacFarlane author: John MacFarlane
version: 0.18 version: 0.19
date: 2015-03-03 date: 2015-04-27
license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
... ...
@ -192,8 +192,8 @@ an implementation without writing an abstract syntax tree renderer.
This document is generated from a text file, `spec.txt`, written This document is generated from a text file, `spec.txt`, written
in Markdown with a small extension for the side-by-side tests. in Markdown with a small extension for the side-by-side tests.
The script `spec2md.pl` can be used to turn `spec.txt` into pandoc The script `tools/makespec.py` can be used to convert `spec.txt` into
Markdown, which can then be converted into other formats. HTML or CommonMark (which can then be converted into other formats).
In the examples, the `→` character is used to represent tabs. In the examples, the `→` character is used to represent tabs.
@ -724,13 +724,14 @@ ATX headers can be empty:
## Setext headers ## Setext headers
A [setext header](@setext-header) A [setext header](@setext-header)
consists of a line of text, containing at least one consists of a line of text, containing at least one [non-space character],
[non-space character],
with no more than 3 spaces indentation, followed by a [setext header with no more than 3 spaces indentation, followed by a [setext header
underline]. The line of text must be underline]. The line of text must be
one that, were it not followed by the setext header underline, one that, were it not followed by the setext header underline,
would be interpreted as part of a paragraph: it cannot be a code would be interpreted as part of a paragraph: it cannot be
block, header, blockquote, horizontal rule, or list. interpretable as a [code fence], [ATX header][ATX headers],
[block quote][block quotes], [horizontal rule][horizontal rules],
[list item][list items], or [HTML block][HTML blocks].
A [setext header underline](@setext-header-underline) is a sequence of A [setext header underline](@setext-header-underline) is a sequence of
`=` characters or a sequence of `-` characters, with no more than 3 `=` characters or a sequence of `-` characters, with no more than 3
@ -1811,7 +1812,7 @@ title], which if it is present must be separated
from the [link destination] by [whitespace]. from the [link destination] by [whitespace].
No further [non-space character]s may occur on the line. No further [non-space character]s may occur on the line.
A [link reference-definition] A [link reference definition]
does not correspond to a structural element of a document. Instead, it does not correspond to a structural element of a document. Instead, it
defines a label which can be used in [reference link]s defines a label which can be used in [reference link]s
and reference-style [images] elsewhere in the document. [Link and reference-style [images] elsewhere in the document. [Link
@ -2587,7 +2588,7 @@ The following rules define [list items]:
1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of 1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of
blocks *Bs* starting with a [non-space character] and not separated blocks *Bs* starting with a [non-space character] and not separated
from each other by more than one blank line, and *M* is a list from each other by more than one blank line, and *M* is a list
marker *M* of width *W* followed by 0 < *N* < 5 spaces, then the result marker of width *W* followed by 0 < *N* < 5 spaces, then the result
of prepending *M* and the following spaces to the first line of of prepending *M* and the following spaces to the first line of
*Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a
list item with *Bs* as its contents. The type of the list item list item with *Bs* as its contents. The type of the list item
@ -2726,7 +2727,7 @@ this example:
Here `two` occurs in the same column as the list marker `1.`, Here `two` occurs in the same column as the list marker `1.`,
but is actually contained in the list item, because there is but is actually contained in the list item, because there is
sufficent indentation after the last containing blockquote marker. sufficient indentation after the last containing blockquote marker.
The converse is also possible. In the following example, the word `two` The converse is also possible. In the following example, the word `two`
occurs far to the right of the initial text of the list item, `one`, but occurs far to the right of the initial text of the list item, `one`, but
@ -2852,7 +2853,7 @@ A list item may contain any kind of block:
2. **Item starting with indented code.** If a sequence of lines *Ls* 2. **Item starting with indented code.** If a sequence of lines *Ls*
constitute a sequence of blocks *Bs* starting with an indented code constitute a sequence of blocks *Bs* starting with an indented code
block and not separated from each other by more than one blank line, block and not separated from each other by more than one blank line,
and *M* is a list marker *M* of width *W* followed by and *M* is a list marker of width *W* followed by
one space, then the result of prepending *M* and the following one space, then the result of prepending *M* and the following
space to the first line of *Ls*, and indenting subsequent lines of space to the first line of *Ls*, and indenting subsequent lines of
*Ls* by *W + 1* spaces, is a list item with *Bs* as its contents. *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents.
@ -3001,7 +3002,7 @@ the above case:
3. **Item starting with a blank line.** If a sequence of lines *Ls* 3. **Item starting with a blank line.** If a sequence of lines *Ls*
starting with a single [blank line] constitute a (possibly empty) starting with a single [blank line] constitute a (possibly empty)
sequence of blocks *Bs*, not separated from each other by more than sequence of blocks *Bs*, not separated from each other by more than
one blank line, and *M* is a list marker *M* of width *W*, one blank line, and *M* is a list marker of width *W*,
then the result of prepending *M* to the first line of *Ls*, and then the result of prepending *M* to the first line of *Ls*, and
indenting subsequent lines of *Ls* by *W + 1* spaces, is a list indenting subsequent lines of *Ls* by *W + 1* spaces, is a list
item with *Bs* as its contents. item with *Bs* as its contents.
@ -3090,7 +3091,7 @@ A list may start or end with an empty list item:
4. **Indentation.** If a sequence of lines *Ls* constitutes a list item 4. **Indentation.** If a sequence of lines *Ls* constitutes a list item
according to rule #1, #2, or #3, then the result of indenting each line according to rule #1, #2, or #3, then the result of indenting each line
of *L* by 1-3 spaces (the same for each line) also constitutes a of *Ls* by 1-3 spaces (the same for each line) also constitutes a
list item with the same contents and attributes. If a line is list item with the same contents and attributes. If a line is
empty, then it need not be indented. empty, then it need not be indented.
@ -4275,8 +4276,8 @@ corresponding codepoints.
[Decimal entities](@decimal-entities) [Decimal entities](@decimal-entities)
consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
entities need to be recognised and tranformed into their corresponding entities need to be recognised and transformed into their corresponding
UTF8 codepoints. Invalid Unicode codepoints will be written as the unicode codepoints. Invalid unicode codepoints will be written as the
"unknown codepoint" character (`0xFFFD`) "unknown codepoint" character (`0xFFFD`)
. .
@ -4287,7 +4288,8 @@ UTF8 codepoints. Invalid Unicode codepoints will be written as the
[Hexadecimal entities](@hexadecimal-entities) [Hexadecimal entities](@hexadecimal-entities)
consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits
+ `;`. They will also be parsed and turned into their corresponding UTF8 values in the AST. + `;`. They will also be parsed and turned into the corresponding
unicode codepoints in the AST.
. .
&#X22; &#XD06; &#xcab; &#X22; &#XD06; &#xcab;
@ -4581,14 +4583,16 @@ characters that is not preceded or followed by a `_` character.
A [left-flanking delimiter run](@left-flanking-delimiter-run) is A [left-flanking delimiter run](@left-flanking-delimiter-run) is
a [delimiter run] that is (a) not followed by [unicode whitespace], a [delimiter run] that is (a) not followed by [unicode whitespace],
and (b) either not followed by a [punctuation character], or and (b) either not followed by a [punctuation character], or
preceded by [unicode whitespace] or a [punctuation character] or preceded by [unicode whitespace] or a [punctuation character].
the beginning of a line. For purposes of this definition, the beginning and the end of
the line count as unicode whitespace.
A [right-flanking delimiter run](@right-flanking-delimiter-run) is A [right-flanking delimiter run](@right-flanking-delimiter-run) is
a [delimiter run] that is (a) not preceded by [unicode whitespace], a [delimiter run] that is (a) not preceded by [unicode whitespace],
and (b) either not preceded by a [punctuation character], or and (b) either not preceded by a [punctuation character], or
followed by [unicode whitespace] or a [punctuation character] or followed by [unicode whitespace] or a [punctuation character].
the end of a line. For purposes of this definition, the beginning and the end of
the line count as unicode whitespace.
Here are some examples of delimiter runs. Here are some examples of delimiter runs.
@ -4604,20 +4608,20 @@ Here are some examples of delimiter runs.
- right-flanking but not left-flanking: - right-flanking but not left-flanking:
``` ```
abc*** abc***
abc_ abc_
"abc"** "abc"**
_"abc" "abc"_
``` ```
- Both right and right-flanking: - Both left and right-flanking:
``` ```
abc***def abc***def
"abc"_"def" "abc"_"def"
``` ```
- Neither right nor right-flanking: - Neither left nor right-flanking:
``` ```
abc *** def abc *** def
@ -4635,32 +4639,40 @@ are a bit more complex than the ones given here.)
The following rules define emphasis and strong emphasis: The following rules define emphasis and strong emphasis:
1. A single `*` character [can open emphasis](@can-open-emphasis) 1. A single `*` character [can open emphasis](@can-open-emphasis)
iff it is part of a [left-flanking delimiter run]. iff (if and only if) it is part of a [left-flanking delimiter run].
2. A single `_` character [can open emphasis] iff 2. A single `_` character [can open emphasis] iff
it is part of a [left-flanking delimiter run] it is part of a [left-flanking delimiter run]
and not part of a [right-flanking delimiter run]. and either (a) not part of a [right-flanking delimiter run]
or (b) part of a [right-flanking delimeter run]
preceded by punctuation.
3. A single `*` character [can close emphasis](@can-close-emphasis) 3. A single `*` character [can close emphasis](@can-close-emphasis)
iff it is part of a [right-flanking delimiter run]. iff it is part of a [right-flanking delimiter run].
4. A single `_` character [can close emphasis] 4. A single `_` character [can close emphasis] iff
iff it is part of a [right-flanking delimiter run] it is part of a [right-flanking delimiter run]
and not part of a [left-flanking delimiter run]. and either (a) not part of a [left-flanking delimiter run]
or (b) part of a [left-flanking delimeter run]
followed by punctuation.
5. A double `**` [can open strong emphasis](@can-open-strong-emphasis) 5. A double `**` [can open strong emphasis](@can-open-strong-emphasis)
iff it is part of a [left-flanking delimiter run]. iff it is part of a [left-flanking delimiter run].
6. A double `__` [can open strong emphasis] 6. A double `__` [can open strong emphasis] iff
iff it is part of a [left-flanking delimiter run] it is part of a [left-flanking delimiter run]
and not part of a [right-flanking delimiter run]. and either (a) not part of a [right-flanking delimiter run]
or (b) part of a [right-flanking delimeter run]
preceded by punctuation.
7. A double `**` [can close strong emphasis](@can-close-strong-emphasis) 7. A double `**` [can close strong emphasis](@can-close-strong-emphasis)
iff it is part of a [right-flanking delimiter run]. iff it is part of a [right-flanking delimiter run].
8. A double `__` [can close strong emphasis] 8. A double `__` [can close strong emphasis]
iff it is part of a [right-flanking delimiter run] it is part of a [right-flanking delimiter run]
and not part of a [left-flanking delimiter run]. and either (a) not part of a [left-flanking delimiter run]
or (b) part of a [left-flanking delimeter run]
followed by punctuation.
9. Emphasis begins with a delimiter that [can open emphasis] and ends 9. Emphasis begins with a delimiter that [can open emphasis] and ends
with a delimiter that [can close emphasis], and that uses the same with a delimiter that [can close emphasis], and that uses the same
@ -4822,13 +4834,14 @@ aa_"bb"_cc
<p>aa_&quot;bb&quot;_cc</p> <p>aa_&quot;bb&quot;_cc</p>
. .
Here there is no emphasis, because the delimiter runs are This is emphasis, even though the opening delimiter is
both left- and right-flanking: both left- and right-flanking, because it is preceded by
punctuation:
. .
"aa"_"bb"_"cc" foo-_(bar)_
. .
<p>&quot;aa&quot;_&quot;bb&quot;_&quot;cc&quot;</p> <p>foo-<em>(bar)</em></p>
. .
Rule 3: Rule 3:
@ -4939,6 +4952,16 @@ _foo_bar_baz_
<p><em>foo_bar_baz</em></p> <p><em>foo_bar_baz</em></p>
. .
This is emphasis, even though the closing delimiter is
both left- and right-flanking, because it is followed by
punctuation:
.
_(bar)_.
.
<p><em>(bar)</em>.</p>
.
Rule 5: Rule 5:
. .
@ -5035,6 +5058,17 @@ __foo, __bar__, baz__
<p><strong>foo, <strong>bar</strong>, baz</strong></p> <p><strong>foo, <strong>bar</strong>, baz</strong></p>
. .
This is strong emphasis, even though the opening delimiter is
both left- and right-flanking, because it is preceded by
punctuation:
.
foo-_(bar)_
.
<p>foo-<em>(bar)</em></p>
.
Rule 7: Rule 7:
This is not strong emphasis, because the closing delimiter is preceded This is not strong emphasis, because the closing delimiter is preceded
@ -5138,6 +5172,16 @@ __foo__bar__baz__
<p><strong>foo__bar__baz</strong></p> <p><strong>foo__bar__baz</strong></p>
. .
This is strong emphasis, even though the closing delimiter is
both left- and right-flanking, because it is followed by
punctuation:
.
_(bar)_.
.
<p><em>(bar)</em>.</p>
.
Rule 9: Rule 9:
Any nonempty sequence of inline elements can be the contents of an Any nonempty sequence of inline elements can be the contents of an
@ -5706,7 +5750,7 @@ A [link destination](@link-destination) consists of either
ASCII space or control characters, and includes parentheses ASCII space or control characters, and includes parentheses
only if (a) they are backslash-escaped or (b) they are part of only if (a) they are backslash-escaped or (b) they are part of
a balanced pair of unescaped parentheses that is not itself a balanced pair of unescaped parentheses that is not itself
inside a balanced pair of unescaped paretheses. inside a balanced pair of unescaped parentheses.
A [link title](@link-title) consists of either A [link title](@link-title) consists of either
@ -5839,8 +5883,8 @@ in Markdown:
URL-escaping should be left alone inside the destination, as all URL-escaping should be left alone inside the destination, as all
URL-escaped characters are also valid URL characters. HTML entities in URL-escaped characters are also valid URL characters. HTML entities in
the destination will be parsed into their UTF-8 codepoints, as usual, and the destination will be parsed into the corresponding unicode
optionally URL-escaped when written as HTML. codepoints, as usual, and optionally URL-escaped when written as HTML.
. .
[link](foo%20b&auml;) [link](foo%20b&auml;)
@ -7215,10 +7259,10 @@ foo
## Soft line breaks ## Soft line breaks
A regular line break (not in a code span or HTML tag) that is not A regular line break (not in a code span or HTML tag) that is not
preceded by two or more spaces is parsed as a softbreak. (A preceded by two or more spaces or a backslash is parsed as a
softbreak may be rendered in HTML either as a softbreak. (A softbreak may be rendered in HTML either as a
[line ending] or as a space. The result will be the same [line ending] or as a space. The result will be the same in
in browsers. In the examples here, a [line ending] will be used.) browsers. In the examples here, a [line ending] will be used.)
. .
foo foo

Loading…
Cancel
Save