Browse Source

Update CommonMark spec to 0.20

pull/124/head
Alex Kocharin 10 years ago
parent
commit
7b961ee1ef
  1. 1202
      test/fixtures/commonmark/good.txt
  2. 197
      test/fixtures/commonmark/spec.txt

1202
test/fixtures/commonmark/good.txt

File diff suppressed because it is too large

197
test/fixtures/commonmark/spec.txt

@ -1,8 +1,8 @@
---
title: CommonMark Spec
author: John MacFarlane
version: 0.19
date: 2015-04-27
version: 0.20
date: 2015-06-08
license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
...
@ -212,12 +212,8 @@ to a certain encoding.
A [line](@line) is a sequence of zero or more [character]s
followed by a [line ending] or by the end of file.
A [line ending](@line-ending) is, depending on the platform, a
newline (`U+000A`), carriage return (`U+000D`), or
carriage return + newline.
For security reasons, a conforming parser must strip or replace the
Unicode character `U+0000`.
A [line ending](@line-ending) is a newline (`U+000A`), carriage return
(`U+000D`), or carriage return + newline.
A line containing no characters, or a line containing only spaces
(`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line).
@ -239,7 +235,10 @@ carriage return (`U+000D`), newline (`U+000A`), or form feed
[Unicode whitespace](@unicode-whitespace) is a sequence of one
or more [unicode whitespace character]s.
A [non-space character](@non-space-character) is anything but `U+0020`.
A [space](@space) is `U+0020`.
A [non-space character](@non-space-character) is any character
that is not a [whitespace character].
An [ASCII punctuation character](@ascii-punctuation-character)
is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,
@ -250,9 +249,10 @@ A [punctuation character](@punctuation-character) is an [ASCII
punctuation character] or anything in
the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.
## Tab expansion
## Preprocessing
Tabs in lines are expanded to spaces, with a tab stop of 4 characters:
Tabs in lines are immediately expanded to [spaces][space], with a tab
stop of 4 characters:
.
→foo→baz→→bim
@ -270,14 +270,19 @@ Tabs in lines are expanded to spaces, with a tab stop of 4 characters:
</code></pre>
.
## Insecure characters
For security reasons, the Unicode character `U+0000` must be replaced
with the replacement character (`U+FFFD`).
# Blocks and inlines
We can think of a document as a sequence of
[blocks](@block)---structural
elements like paragraphs, block quotations,
lists, headers, rules, and code blocks. Blocks can contain other
blocks, or they can contain [inline](@inline) content:
words, spaces, links, emphasized text, images, and inline code.
[blocks](@block)---structural elements like paragraphs, block
quotations, lists, headers, rules, and code blocks. Some blocks (like
block quotes and list items) contain other blocks; others (like
headers and paragraphs) contain [inline](@inline) content---text,
links, emphasized text, images, code, and so on.
## Precedence
@ -528,12 +533,12 @@ consists of a string of characters, parsed as inline content, between an
opening sequence of 1--6 unescaped `#` characters and an optional
closing sequence of any number of `#` characters. The opening sequence
of `#` characters cannot be followed directly by a
[non-space character].
The optional closing sequence of `#`s must be preceded by a space and may be
followed by spaces only. The opening `#` character may be indented 0-3
spaces. The raw contents of the header are stripped of leading and
trailing spaces before being parsed as inline content. The header level
is equal to the number of `#` characters in the opening sequence.
[non-space character]. The optional closing sequence of `#`s must be
preceded by a [space] and may be followed by spaces only. The opening
`#` character may be indented 0-3 spaces. The raw contents of the
header are stripped of leading and trailing spaces before being parsed
as inline content. The header level is equal to the number of `#`
characters in the opening sequence.
Simple headers:
@ -561,16 +566,21 @@ More than six `#` characters is not a header:
<p>####### foo</p>
.
A space is required between the `#` characters and the header's
contents. Note that many implementations currently do not require
the space. However, the space was required by the [original ATX
implementation](http://www.aaronsw.com/2002/atx/atx.py), and it helps
prevent things like the following from being parsed as headers:
At least one space is required between the `#` characters and the
header's contents, unless the header is empty. Note that many
implementations currently do not require the space. However, the
space was required by the
[original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py),
and it helps prevent things like the following from being parsed as
headers:
.
#5 bolt
#foobar
.
<p>#5 bolt</p>
<p>#foobar</p>
.
This is not a header, because the first `#` is escaped:
@ -1024,7 +1034,41 @@ paragraph.)
</code></pre>
.
The contents are literal text, and do not get parsed as Markdown:
If there is any ambiguity between an interpretation of indentation
as a code block and as indicating that material belongs to a [list
item][list items], the list item interpretation takes precedence:
.
- foo
bar
.
<ul>
<li>
<p>foo</p>
<p>bar</p>
</li>
</ul>
.
.
1. foo
- bar
.
<ol>
<li>
<p>foo</p>
<ul>
<li>bar</li>
</ul>
</li>
</ol>
.
The contents of a code block are literal text, and do not get parsed
as Markdown:
.
<a/>
@ -2325,9 +2369,16 @@ foo</p>
</blockquote>
.
Laziness only applies to lines that are continuations of
paragraphs. Lines containing characters or indentation that indicate
block structure cannot be lazy.
Laziness only applies to lines that would have been continuations of
paragraphs had they been prepended with `>`. For example, the
`>` cannot be omitted in the second line of
``` markdown
> foo
> ---
```
without changing the meaning:
.
> foo
@ -2339,6 +2390,15 @@ block structure cannot be lazy.
<hr />
.
Similarly, if we omit the `>` in the second line of
``` markdown
> - foo
> - bar
```
then the block quote ends after the first line:
.
> - foo
- bar
@ -2353,6 +2413,9 @@ block structure cannot be lazy.
</ul>
.
For the same reason, we can't omit the `>` in front of
subsequent lines of an indented or fenced code block:
.
> foo
bar
@ -3835,9 +3898,11 @@ item:
- b
- c
- d
- e
- f
- g
- e
- f
- g
- h
- i
.
<ul>
<li>a</li>
@ -3847,9 +3912,31 @@ item:
<li>e</li>
<li>f</li>
<li>g</li>
<li>h</li>
<li>i</li>
</ul>
.
.
1. a
2. b
3. c
.
<ol>
<li>
<p>a</p>
</li>
<li>
<p>b</p>
</li>
<li>
<p>c</p>
</li>
</ol>
.
This is a loose list, because there is a blank line between
two of the list items:
@ -4277,13 +4364,14 @@ corresponding codepoints.
[Decimal entities](@decimal-entities)
consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
entities need to be recognised and transformed into their corresponding
unicode codepoints. Invalid unicode codepoints will be written as the
"unknown codepoint" character (`0xFFFD`)
unicode codepoints. Invalid unicode codepoints will be replaced by
the "unknown codepoint" character (`U+FFFD`). For security reasons,
the codepoint `U+0000` will also be replaced by `U+FFFD`.
.
&#35; &#1234; &#992; &#98765432;
&#35; &#1234; &#992; &#98765432; &#0;
.
<p># Ӓ Ϡ �</p>
<p># Ӓ Ϡ �</p>
.
[Hexadecimal entities](@hexadecimal-entities)
@ -5063,9 +5151,9 @@ both left- and right-flanking, because it is preceded by
punctuation:
.
foo-_(bar)_
foo-__(bar)__
.
<p>foo-<em>(bar)</em></p>
<p>foo-<strong>(bar)</strong></p>
.
@ -5177,9 +5265,9 @@ both left- and right-flanking, because it is followed by
punctuation:
.
_(bar)_.
__(bar)__.
.
<p><em>(bar)</em>.</p>
<p><strong>(bar)</strong>.</p>
.
Rule 9:
@ -6086,6 +6174,7 @@ that [matches] a [link reference definition] elsewhere in the document.
A [link label](@link-label) begins with a left bracket (`[`) and ends
with the first right bracket (`]`) that is not backslash-escaped.
Between these brackets there must be at least one non-[whitespace character].
Unescaped square bracket characters are not allowed in
[link label]s. A link label can have at most 999
characters inside the square brackets.
@ -6332,6 +6421,30 @@ backslash-escaped:
<p><a href="/uri">foo</a></p>
.
A [link label] must contain at least one non-[whitespace character]:
.
[]
[]: /uri
.
<p>[]</p>
<p>[]: /uri</p>
.
.
[
]
[
]: /uri
.
<p>[
]</p>
<p>[
]: /uri</p>
.
A [collapsed reference link](@collapsed-reference-link)
consists of a [link label] that [matches] a
[link reference definition] elsewhere in the

Loading…
Cancel
Save