Browse Source

Update CommonMark spec to 0.19

pull/104/head
Alex Kocharin 10 years ago
parent
commit
7cd639ed39
  1. 1041
      test/fixtures/commonmark/good.txt
  2. 140
      test/fixtures/commonmark/spec.txt

1041
test/fixtures/commonmark/good.txt

File diff suppressed because it is too large

140
test/fixtures/commonmark/spec.txt

@ -1,8 +1,8 @@
---
title: CommonMark Spec
author: John MacFarlane
version: 0.18
date: 2015-03-03
version: 0.19
date: 2015-04-27
license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
...
@ -192,8 +192,8 @@ an implementation without writing an abstract syntax tree renderer.
This document is generated from a text file, `spec.txt`, written
in Markdown with a small extension for the side-by-side tests.
The script `spec2md.pl` can be used to turn `spec.txt` into pandoc
Markdown, which can then be converted into other formats.
The script `tools/makespec.py` can be used to convert `spec.txt` into
HTML or CommonMark (which can then be converted into other formats).
In the examples, the `→` character is used to represent tabs.
@ -724,13 +724,14 @@ ATX headers can be empty:
## Setext headers
A [setext header](@setext-header)
consists of a line of text, containing at least one
[non-space character],
consists of a line of text, containing at least one [non-space character],
with no more than 3 spaces indentation, followed by a [setext header
underline]. The line of text must be
one that, were it not followed by the setext header underline,
would be interpreted as part of a paragraph: it cannot be a code
block, header, blockquote, horizontal rule, or list.
would be interpreted as part of a paragraph: it cannot be
interpretable as a [code fence], [ATX header][ATX headers],
[block quote][block quotes], [horizontal rule][horizontal rules],
[list item][list items], or [HTML block][HTML blocks].
A [setext header underline](@setext-header-underline) is a sequence of
`=` characters or a sequence of `-` characters, with no more than 3
@ -1811,7 +1812,7 @@ title], which if it is present must be separated
from the [link destination] by [whitespace].
No further [non-space character]s may occur on the line.
A [link reference-definition]
A [link reference definition]
does not correspond to a structural element of a document. Instead, it
defines a label which can be used in [reference link]s
and reference-style [images] elsewhere in the document. [Link
@ -2587,7 +2588,7 @@ The following rules define [list items]:
1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of
blocks *Bs* starting with a [non-space character] and not separated
from each other by more than one blank line, and *M* is a list
marker *M* of width *W* followed by 0 < *N* < 5 spaces, then the result
marker of width *W* followed by 0 < *N* < 5 spaces, then the result
of prepending *M* and the following spaces to the first line of
*Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a
list item with *Bs* as its contents. The type of the list item
@ -2726,7 +2727,7 @@ this example:
Here `two` occurs in the same column as the list marker `1.`,
but is actually contained in the list item, because there is
sufficent indentation after the last containing blockquote marker.
sufficient indentation after the last containing blockquote marker.
The converse is also possible. In the following example, the word `two`
occurs far to the right of the initial text of the list item, `one`, but
@ -2852,7 +2853,7 @@ A list item may contain any kind of block:
2. **Item starting with indented code.** If a sequence of lines *Ls*
constitute a sequence of blocks *Bs* starting with an indented code
block and not separated from each other by more than one blank line,
and *M* is a list marker *M* of width *W* followed by
and *M* is a list marker of width *W* followed by
one space, then the result of prepending *M* and the following
space to the first line of *Ls*, and indenting subsequent lines of
*Ls* by *W + 1* spaces, is a list item with *Bs* as its contents.
@ -3001,7 +3002,7 @@ the above case:
3. **Item starting with a blank line.** If a sequence of lines *Ls*
starting with a single [blank line] constitute a (possibly empty)
sequence of blocks *Bs*, not separated from each other by more than
one blank line, and *M* is a list marker *M* of width *W*,
one blank line, and *M* is a list marker of width *W*,
then the result of prepending *M* to the first line of *Ls*, and
indenting subsequent lines of *Ls* by *W + 1* spaces, is a list
item with *Bs* as its contents.
@ -3090,7 +3091,7 @@ A list may start or end with an empty list item:
4. **Indentation.** If a sequence of lines *Ls* constitutes a list item
according to rule #1, #2, or #3, then the result of indenting each line
of *L* by 1-3 spaces (the same for each line) also constitutes a
of *Ls* by 1-3 spaces (the same for each line) also constitutes a
list item with the same contents and attributes. If a line is
empty, then it need not be indented.
@ -4275,8 +4276,8 @@ corresponding codepoints.
[Decimal entities](@decimal-entities)
consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
entities need to be recognised and tranformed into their corresponding
UTF8 codepoints. Invalid Unicode codepoints will be written as the
entities need to be recognised and transformed into their corresponding
unicode codepoints. Invalid unicode codepoints will be written as the
"unknown codepoint" character (`0xFFFD`)
.
@ -4287,7 +4288,8 @@ UTF8 codepoints. Invalid Unicode codepoints will be written as the
[Hexadecimal entities](@hexadecimal-entities)
consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits
+ `;`. They will also be parsed and turned into their corresponding UTF8 values in the AST.
+ `;`. They will also be parsed and turned into the corresponding
unicode codepoints in the AST.
.
&#X22; &#XD06; &#xcab;
@ -4581,14 +4583,16 @@ characters that is not preceded or followed by a `_` character.
A [left-flanking delimiter run](@left-flanking-delimiter-run) is
a [delimiter run] that is (a) not followed by [unicode whitespace],
and (b) either not followed by a [punctuation character], or
preceded by [unicode whitespace] or a [punctuation character] or
the beginning of a line.
preceded by [unicode whitespace] or a [punctuation character].
For purposes of this definition, the beginning and the end of
the line count as unicode whitespace.
A [right-flanking delimiter run](@right-flanking-delimiter-run) is
a [delimiter run] that is (a) not preceded by [unicode whitespace],
and (b) either not preceded by a [punctuation character], or
followed by [unicode whitespace] or a [punctuation character] or
the end of a line.
followed by [unicode whitespace] or a [punctuation character].
For purposes of this definition, the beginning and the end of
the line count as unicode whitespace.
Here are some examples of delimiter runs.
@ -4604,20 +4608,20 @@ Here are some examples of delimiter runs.
- right-flanking but not left-flanking:
```
abc***
abc_
abc***
abc_
"abc"**
_"abc"
"abc"_
```
- Both right and right-flanking:
- Both left and right-flanking:
```
abc***def
abc***def
"abc"_"def"
```
- Neither right nor right-flanking:
- Neither left nor right-flanking:
```
abc *** def
@ -4635,32 +4639,40 @@ are a bit more complex than the ones given here.)
The following rules define emphasis and strong emphasis:
1. A single `*` character [can open emphasis](@can-open-emphasis)
iff it is part of a [left-flanking delimiter run].
iff (if and only if) it is part of a [left-flanking delimiter run].
2. A single `_` character [can open emphasis] iff
it is part of a [left-flanking delimiter run]
and not part of a [right-flanking delimiter run].
and either (a) not part of a [right-flanking delimiter run]
or (b) part of a [right-flanking delimeter run]
preceded by punctuation.
3. A single `*` character [can close emphasis](@can-close-emphasis)
iff it is part of a [right-flanking delimiter run].
4. A single `_` character [can close emphasis]
iff it is part of a [right-flanking delimiter run]
and not part of a [left-flanking delimiter run].
4. A single `_` character [can close emphasis] iff
it is part of a [right-flanking delimiter run]
and either (a) not part of a [left-flanking delimiter run]
or (b) part of a [left-flanking delimeter run]
followed by punctuation.
5. A double `**` [can open strong emphasis](@can-open-strong-emphasis)
iff it is part of a [left-flanking delimiter run].
6. A double `__` [can open strong emphasis]
iff it is part of a [left-flanking delimiter run]
and not part of a [right-flanking delimiter run].
6. A double `__` [can open strong emphasis] iff
it is part of a [left-flanking delimiter run]
and either (a) not part of a [right-flanking delimiter run]
or (b) part of a [right-flanking delimeter run]
preceded by punctuation.
7. A double `**` [can close strong emphasis](@can-close-strong-emphasis)
iff it is part of a [right-flanking delimiter run].
8. A double `__` [can close strong emphasis]
iff it is part of a [right-flanking delimiter run]
and not part of a [left-flanking delimiter run].
it is part of a [right-flanking delimiter run]
and either (a) not part of a [left-flanking delimiter run]
or (b) part of a [left-flanking delimeter run]
followed by punctuation.
9. Emphasis begins with a delimiter that [can open emphasis] and ends
with a delimiter that [can close emphasis], and that uses the same
@ -4822,13 +4834,14 @@ aa_"bb"_cc
<p>aa_&quot;bb&quot;_cc</p>
.
Here there is no emphasis, because the delimiter runs are
both left- and right-flanking:
This is emphasis, even though the opening delimiter is
both left- and right-flanking, because it is preceded by
punctuation:
.
"aa"_"bb"_"cc"
foo-_(bar)_
.
<p>&quot;aa&quot;_&quot;bb&quot;_&quot;cc&quot;</p>
<p>foo-<em>(bar)</em></p>
.
Rule 3:
@ -4939,6 +4952,16 @@ _foo_bar_baz_
<p><em>foo_bar_baz</em></p>
.
This is emphasis, even though the closing delimiter is
both left- and right-flanking, because it is followed by
punctuation:
.
_(bar)_.
.
<p><em>(bar)</em>.</p>
.
Rule 5:
.
@ -5035,6 +5058,17 @@ __foo, __bar__, baz__
<p><strong>foo, <strong>bar</strong>, baz</strong></p>
.
This is strong emphasis, even though the opening delimiter is
both left- and right-flanking, because it is preceded by
punctuation:
.
foo-_(bar)_
.
<p>foo-<em>(bar)</em></p>
.
Rule 7:
This is not strong emphasis, because the closing delimiter is preceded
@ -5138,6 +5172,16 @@ __foo__bar__baz__
<p><strong>foo__bar__baz</strong></p>
.
This is strong emphasis, even though the closing delimiter is
both left- and right-flanking, because it is followed by
punctuation:
.
_(bar)_.
.
<p><em>(bar)</em>.</p>
.
Rule 9:
Any nonempty sequence of inline elements can be the contents of an
@ -5706,7 +5750,7 @@ A [link destination](@link-destination) consists of either
ASCII space or control characters, and includes parentheses
only if (a) they are backslash-escaped or (b) they are part of
a balanced pair of unescaped parentheses that is not itself
inside a balanced pair of unescaped paretheses.
inside a balanced pair of unescaped parentheses.
A [link title](@link-title) consists of either
@ -5839,8 +5883,8 @@ in Markdown:
URL-escaping should be left alone inside the destination, as all
URL-escaped characters are also valid URL characters. HTML entities in
the destination will be parsed into their UTF-8 codepoints, as usual, and
optionally URL-escaped when written as HTML.
the destination will be parsed into the corresponding unicode
codepoints, as usual, and optionally URL-escaped when written as HTML.
.
[link](foo%20b&auml;)
@ -7215,10 +7259,10 @@ foo
## Soft line breaks
A regular line break (not in a code span or HTML tag) that is not
preceded by two or more spaces is parsed as a softbreak. (A
softbreak may be rendered in HTML either as a
[line ending] or as a space. The result will be the same
in browsers. In the examples here, a [line ending] will be used.)
preceded by two or more spaces or a backslash is parsed as a
softbreak. (A softbreak may be rendered in HTML either as a
[line ending] or as a space. The result will be the same in
browsers. In the examples here, a [line ending] will be used.)
.
foo

Loading…
Cancel
Save