Update CommonMark spec to 0.20

10 years ago · 7b961ee1ef
2 changed files with 802 additions and 597 deletions
--- a/test/fixtures/commonmark/good.txt
+++ b/test/fixtures/commonmark/good.txt
--- a/test/fixtures/commonmark/spec.txt
+++ b/test/fixtures/commonmark/spec.txt
@ -1,8 +1,8 @@
 ---
 title: CommonMark Spec
 author: John MacFarlane
-version: 0.19
+version: 0.20
-date: 2015-04-27
+date: 2015-06-08
 license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
 ...
@ -212,12 +212,8 @@ to a certain encoding.
 A [line](@line) is a sequence of zero or more [character]s
 followed by a [line ending] or by the end of file.
-A [line ending](@line-ending) is, depending on the platform, a
+A [line ending](@line-ending) is a newline (`U+000A`), carriage return
-newline (`U+000A`), carriage return (`U+000D`), or
+(`U+000D`), or carriage return + newline.
 carriage return + newline.
 For security reasons, a conforming parser must strip or replace the
 Unicode character `U+0000`.
 A line containing no characters, or a line containing only spaces
 (`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line).
@ -239,7 +235,10 @@ carriage return (`U+000D`), newline (`U+000A`), or form feed
 [Unicode whitespace](@unicode-whitespace) is a sequence of one
 or more [unicode whitespace character]s.
-A [non-space character](@non-space-character) is anything but `U+0020`.
+A [space](@space) is `U+0020`.
 A [non-space character](@non-space-character) is any character
 that is not a [whitespace character].
 An [ASCII punctuation character](@ascii-punctuation-character)
 is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,
@ -250,9 +249,10 @@ A [punctuation character](@punctuation-character) is an [ASCII
 punctuation character] or anything in
 the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.
-## Tab expansion
+## Preprocessing
-Tabs in lines are expanded to spaces, with a tab stop of 4 characters:
+Tabs in lines are immediately expanded to [spaces][space], with a tab
 stop of 4 characters:
 .
 →foo→baz→→bim
@ -270,14 +270,19 @@ Tabs in lines are expanded to spaces, with a tab stop of 4 characters:
 </code></pre>
 .
 ## Insecure characters
 For security reasons, the Unicode character `U+0000` must be replaced
 with the replacement character (`U+FFFD`).
 # Blocks and inlines
 We can think of a document as a sequence of
-[blocks](@block)---structural
+[blocks](@block)---structural elements like paragraphs, block
-elements like paragraphs, block quotations,
+quotations, lists, headers, rules, and code blocks.  Some blocks (like
-lists, headers, rules, and code blocks.  Blocks can contain other
+block quotes and list items) contain other blocks; others (like
-blocks, or they can contain [inline](@inline) content:
+headers and paragraphs) contain [inline](@inline) content---text,
-words, spaces, links, emphasized text, images, and inline code.
+links, emphasized text, images, code, and so on.
 ## Precedence
@ -528,12 +533,12 @@ consists of a string of characters, parsed as inline content, between an
 opening sequence of 1--6 unescaped `#` characters and an optional
 closing sequence of any number of `#` characters.  The opening sequence
 of `#` characters cannot be followed directly by a
-[non-space character].
+[non-space character]. The optional closing sequence of `#`s must be
-The optional closing sequence of `#`s must be preceded by a space and may be
+preceded by a [space] and may be followed by spaces only.  The opening
-followed by spaces only.  The opening `#` character may be indented 0-3
+`#` character may be indented 0-3 spaces.  The raw contents of the
-spaces.  The raw contents of the header are stripped of leading and
+header are stripped of leading and trailing spaces before being parsed
-trailing spaces before being parsed as inline content.  The header level
+as inline content.  The header level is equal to the number of `#`
-is equal to the number of `#` characters in the opening sequence.
+characters in the opening sequence.
 Simple headers:
@ -561,16 +566,21 @@ More than six `#` characters is not a header:
 <p>####### foo</p>
 .
-A space is required between the `#` characters and the header's
+At least one space is required between the `#` characters and the
-contents.  Note that many implementations currently do not require
+header's contents, unless the header is empty.  Note that many
-the space.  However, the space was required by the [original ATX
+implementations currently do not require the space.  However, the
-implementation](http://www.aaronsw.com/2002/atx/atx.py), and it helps
+space was required by the
-prevent things like the following from being parsed as headers:
+[original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py),
 and it helps prevent things like the following from being parsed as
 headers:
 .
 #5 bolt
 #foobar
 .
 <p>#5 bolt</p>
 <p>#foobar</p>
 .
 This is not a header, because the first `#` is escaped:
@ -1024,7 +1034,41 @@ paragraph.)
 </code></pre>
 .
-The contents are literal text, and do not get parsed as Markdown:
+If there is any ambiguity between an interpretation of indentation
 as a code block and as indicating that material belongs to a [list
 item][list items], the list item interpretation takes precedence:
 .
  - foo
    bar
 .
 <ul>
 <li>
 <p>foo</p>
 <p>bar</p>
 </li>
 </ul>
 .
 .
 1.  foo
    - bar
 .
 <ol>
 <li>
 <p>foo</p>
 <ul>
 <li>bar</li>
 </ul>
 </li>
 </ol>
 .
 The contents of a code block are literal text, and do not get parsed
 as Markdown:
 .
    <a/>
@ -2325,9 +2369,16 @@ foo</p>
 </blockquote>
 .
-Laziness only applies to lines that are continuations of
+Laziness only applies to lines that would have been continuations of
-paragraphs. Lines containing characters or indentation that indicate
+paragraphs had they been prepended with `>`.  For example, the
-block structure cannot be lazy.
+`>` cannot be omitted in the second line of
 ``` markdown
 > foo
 > ---
 ```
 without changing the meaning:
 .
 > foo
@ -2339,6 +2390,15 @@ block structure cannot be lazy.
 <hr />
 .
 Similarly, if we omit the `>` in the second line of
 ``` markdown
 > - foo
 > - bar
 ```
 then the block quote ends after the first line:
 .
 > - foo
 - bar
@ -2353,6 +2413,9 @@ block structure cannot be lazy.
 </ul>
 .
 For the same reason, we can't omit the `>` in front of
 subsequent lines of an indented or fenced code block:
 .
 >     foo
    bar
@ -3837,7 +3900,9 @@ item:
   - d
    - e
   - f
- g
+  - g
 - h
 - i
 .
 <ul>
 <li>a</li>
@ -3847,9 +3912,31 @@ item:
 <li>e</li>
 <li>f</li>
 <li>g</li>
 <li>h</li>
 <li>i</li>
 </ul>
 .
 .
 1. a
  2. b
    3. c
 .
 <ol>
 <li>
 <p>a</p>
 </li>
 <li>
 <p>b</p>
 </li>
 <li>
 <p>c</p>
 </li>
 </ol>
 .
 This is a loose list, because there is a blank line between
 two of the list items:
@ -4277,13 +4364,14 @@ corresponding codepoints.
 [Decimal entities](@decimal-entities)
 consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
 entities need to be recognised and transformed into their corresponding
-unicode codepoints. Invalid unicode codepoints will be written as the
+unicode codepoints. Invalid unicode codepoints will be replaced by
-"unknown codepoint" character (`0xFFFD`)
+the "unknown codepoint" character (`U+FFFD`).  For security reasons,
 the codepoint `U+0000` will also be replaced by `U+FFFD`.
 .
-&#35; &#1234; &#992; &#98765432;
+&#35; &#1234; &#992; &#98765432; &#0;
 .
-<p># Ӓ Ϡ �</p>
+<p># Ӓ Ϡ � �</p>
 .
 [Hexadecimal entities](@hexadecimal-entities)
@ -5063,9 +5151,9 @@ both left- and right-flanking, because it is preceded by
 punctuation:
 .
-foo-_(bar)_
+foo-__(bar)__
 .
-<p>foo-<em>(bar)</em></p>
+<p>foo-<strong>(bar)</strong></p>
 .
@ -5177,9 +5265,9 @@ both left- and right-flanking, because it is followed by
 punctuation:
 .
-_(bar)_.
+__(bar)__.
 .
-<p><em>(bar)</em>.</p>
+<p><strong>(bar)</strong>.</p>
 .
 Rule 9:
@ -6086,6 +6174,7 @@ that [matches] a [link reference definition] elsewhere in the document.
 A [link label](@link-label)  begins with a left bracket (`[`) and ends
 with the first right bracket (`]`) that is not backslash-escaped.
 Between these brackets there must be at least one non-[whitespace character].
 Unescaped square bracket characters are not allowed in
 [link label]s.  A link label can have at most 999
 characters inside the square brackets.
@ -6332,6 +6421,30 @@ backslash-escaped:
 <p><a href="/uri">foo</a></p>
 .
 A [link label] must contain at least one non-[whitespace character]:
 .
 []
 []: /uri
 .
 <p>[]</p>
 <p>[]: /uri</p>
 .
 .
 [
 ]
 [
 ]: /uri
 .
 <p>[
 ]</p>
 <p>[
 ]: /uri</p>
 .
 A [collapsed reference link](@collapsed-reference-link)
 consists of a [link label] that [matches] a
 [link reference definition] elsewhere in the