Browse Source

Markdown.pl: correct comment sanitation

The XML standard section 2.5 is quite specific:

  the string "--" (double-hyphen) MUST NOT occur within comments

In fact, xmllint will complain about any comments that
incorrectly contain an internal "--" sequence as they are
not valid XML.

Adjust the sanitation code to only pass through valid XML
comments using the same pattern that _HashHTMLBlocks uses
to recognize them.

With this change, invalid XML comments will be treated as
literal text by the sanitizer and have the initial "<" escaped
to &lt; thus rendering them as not a comment at all.

Also take this opportunity to correct the comments in the
_HashHTMLBlocks function from "HTML" to "XML" to reflect
what it actually matches.

Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
master
Kyle J. McKay 4 years ago
parent
commit
003134a723
  1. 11
      Markdown.pl

11
Markdown.pl

@ -834,7 +834,7 @@ sub _HashHTMLBlocks {
"\n\n" . $key . "\n\n";
}eigx;
# Special case for standalone HTML comments:
# Special case for standalone XML comments:
$text =~ s{
(?:
(?<=\n\n) # Starting after a blank line
@ -2535,12 +2535,13 @@ sub _SanitizeTags {
next;
}
my $tstart = pos($text);
if ($text =~ /\G(<[^>]*>)/gc) {
my $tag = $1;
if ($tag =~ /^<!--/) { # pass "comments" through
$ans .= $tag;
if ($text =~ /\G(<!--(?:[^-]|(?:-(?!-)))*-->)/gc) {
# pass "comments" through
$ans .= $1;
next;
}
if ($text =~ /\G(<[^>]*>)/gc) {
my $tag = $1;
my $tt;
if (($tag =~ m{^<($g_possible_tag_name)(?:[\s>]|/>$)} ||
$tag =~ m{^</($g_possible_tag_name)\s*>}) &&

Loading…
Cancel
Save