Process any YAML front matter that may be present by default.
Provide copious options to control how any YAML front matter that
may be present will be handled including the ability to completely
disable YAML front matter processing altogether.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
There is no dingus to play with; stop talking about it.
Also make the "syntax page" link hook up properly.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Make the --raw option an alias for --raw-xml and provide a
new --raw-html option.
Previously the --raw option always activated the auto-closing
and optional-closing tag semantics as indicated in the HTML
standard so that a valid XML document would be output.
Unfortunately, these semantics can result in valid XML documents
being rejected.
For example, "<p><pre></pre></p>" would be turned into
"<p></p><pre></pre></p>" because the standard specifies that
the opening "pre" tag automatically closes the open "p" tag.
Retain these auto-closing semantics under the new --raw-html
option while disabling them under the --raw-xml (aka --raw)
option.
This produces a less surprising outcome when valid XML is
provided as input while still providing access to the
auto-closing semantics (via --raw-html) if explicitly desired
when processing raw input.
The auto-closing semantics remain enabled (as before) for the
non-raw mode when using --validate-xml-internal (the default).
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
When the --wiki option is active, recognize wiki-style image
links in the format:
[[link-to-image.png|align=left,alt=text]]
Where any "well-known" image suffix may be used in place of ".png"
and the "|..." part is optional but may specify any of the "width=",
"height=", "align=" or "alt=" keywords (provided alt= is always last).
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Allow spaces to be retained when generating wiki file names
by using the new "b" wiki sub-option.
Sinces spaces are always trimmed (leading and trailing removed
and runs of multiple replaced with a single) before processing
wiki links, multiple consecutive white space characters are
always collapsed to a single space in the final URL.
Since the retained spaces are subject to URL encoding, they
become "%20" in the final URL.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Given input like this:
hi<p>_</p>there
avoid leaving a dangling text blob outside of any "p" section
like this:
<p>hi</p><p>_</p>there
Instead, auto-open a new "p" section so the final text blob
ends up properly wrapped like so:
<p>hi</p><p>_</p><p>there</p>
This reflects the actual rendering behavior of the client
"user agent" (aka browser) which would end up supplying the
missing <p>...</p> wrapper in any case.
By doing this the output better reflects the way the markup
actually renders.
The heuristic used to auto-open the "p" section may not always
auto-open a "p" when it should, but it should never auto-open
a "p" when it shouldn't.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Since each "paragraph" is wrapped between a "<p>" and "</p>"
this input:
<p>hi
<p>bye
has been producing this output:
<p></p><p>hi</p>
<p></p><p>bye</p>
Correct this so that if the leading "<p>" of the paragraph wrapper
is immediately auto-closed then it's simply discarded rather than
creating a bogus "<p></p>" section.
With this change the previous input now produces this output:
<p>hi</p>
<p>bye</p>
The bogus leading "<p></p>" sections have been omitted and the
output looks much nicer.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
When forming paragraphs, a $string is wrapped to become <p>$string</p>.
If the opening "<p>" ends up being auto-closed by markup within
$string, then either another "<p>" must be auto-opened or the closing
"</p>" of the wrapper must be silently dropped to avoid a validation
failure.
Figuring out exactly where to auto-open the "<p>" turns out to be
somewhat more difficult than just dropping the wrapper's "</p>".
For now just go ahead and drop the wrapper's closing "</p>" if the
wrapper's opening "<p>" has been auto-closed by the time the validator
encounters the wrapper's closing "</p>".
At the same time, make sure that all "optional closing tag" tags
that occur after the wrapper's opening "<p>" get closed immediately
upon encountering the wrapper's closing "</p>" (whether or not it
ultimately gets dropped).
With these changes, this input:
line<p>one
line<p>three
or this input:
line<p>one</p>
line<p>three</p>
produces this output:
<p>line</p><p>one</p>
<p>line</p><p>three</p>
While this input:
line<p>one</p>x1
line<p>three</p>x3
produces this output:
<p>line</p><p>one</p>x1
<p>line</p><p>three</p>x3
In this last example, the "x1" and "x3" text is left hanging outside
of a "p" section. The client "user agent" (aka browser) will end
up rendering these hanging "x1" and "x3" pieces of text in their
own "p" sections.
With these changes, simple markup that would previously have been
rejected for no apparent reason by the default `--validate-xml-internal`
parser while being accepted by the `--validate-xml` option becomes
acceptable to the `--validate-xml-internal` parser as well.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
With a minor enhancement to the support for specifying image
dimensions, images can now be "float"ed to the left or right
or even centered in their own block.
Add the ability to generate a <br clear="all" /> with 3 or
more spaces on the end of a line rather than a plain <br />
with only 2.
Document these additions as well.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Allow wiki names to be "flatten"ed by replacing runs of one
(or more) "/" characters with "%2F" indicated by the new "%"
sub-option. Ultimately these "%2F" replacements become
"%252F" by the time the final URL is generated.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Provide a "wikifunc" 'CODE' ref hook capability to provide
for custom wiki link handling when "use"ing the Markdown module.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
With the `--keep-abs` option absolute path URLs will be preserved
into the output despite any -r/-i options that may be present.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
When stripping XML comments, if any XML comments are recognized as
a standalone block, strip that entire block when forming paragraphs
the final time.
This provides a much cleaner output as it results in many
superfluous blank lines being suppressed that the XML parser
would not otherwise remove when it strips out XML comments.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
When `--strip-comments` is active, if an XML comment is
immediately followed by optional spaces and/or tabs and
a newline, remove those along with the comment itself.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
While the default mode of Markdown.pl remains that of a command
line utility, it's fairly simple to "use Markdown" and call the
functions directly.
Explain this usage in the help and make sure all of the auxiliary
functions that might be used for this appear in @EXPORT_OK.
Include an example that simulates `Markdown.pl --stub --wiki`.
Add a symbolic link from Markdown.pm to Markdown.pl to go
along with the new example.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Even though block tags such as "<p/>" should not appear in
valid XHTML documents, the internal validator (which is
enabled by default) will properly expand "<p/>" to "<p></p>".
However, the block formatting code fails to notice such
an empty tag block leading to it being wrapped in a spurious
"<p>...</p>" pair before it's expanded by the validation code.
Attempt to recognize some of these valid-for-xml-but-not-xhtml
blocks earlier to produce better output.
This is not a perfect fix, but it's an improvement.
It's really an odd edge case anyway that's unlikely to be
encountered very often.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Move sanity checking of arguments to the Markdown and ProcessRaw
functions into a new _SanitizeOpts function.
Call the new _SanitizeOpts function from both Markdown and ProcessRaw.
Document all of the possible options in the _SanitizeOpts function.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Create a new "SetWikiOpts" function that parses the
`--wiki=` option value into the appropriate internal
options settings.
Use the new "SetWikiOpts" function to parse the command
line `--wiki=...` option.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Create a new "GenerateStyleSheet" function that returns
a copy of the internal fancy style sheet using the given
prefix as a prefix of all the CSS style names.
Use the new "GenerateStyleSheet" function to create the
style sheet as needed.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
When parsing a "checkbox" item or image dimensions, recognize
a U+00D7 Multiplication Sign character as equivalent to an "x".
The real "x" is preferred (and still recognized along with "X"),
but in the case where a U+00D7 (×) ends up in there, just go
with it and recognize it as the intent remains clear.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Add an explanation of XML comments for those who may not be familiar
with them including a link to the relevant specification, examples,
and exacting details about where they are and are not recognized.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Combine adjacent (i.e. no separating blank line) standalone
XML comments into the same "block".
This is more efficient, better preserves the original comment
formatting and avoids an unfortunate side-effect that could
introduce unwanted extra paragraphs into the output.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Make use of more of the Getopt::Long::GetOptions API capabilities
to avoid needing extra, awkward code checks.
With this change, options that support negation (e.g. "stylesheet")
or have variants (e.g. "validate-xml-internal") now work as intended
such that the last option given wins.
Additionally, help/version options are now handled immediately
when encountered.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
The XML standard section 2.5 is quite specific:
the string "--" (double-hyphen) MUST NOT occur within comments
In fact, xmllint will complain about any comments that
incorrectly contain an internal "--" sequence as they are
not valid XML.
Adjust the sanitation code to only pass through valid XML
comments using the same pattern that _HashHTMLBlocks uses
to recognize them.
With this change, invalid XML comments will be treated as
literal text by the sanitizer and have the initial "<" escaped
to < thus rendering them as not a comment at all.
Also take this opportunity to correct the comments in the
_HashHTMLBlocks function from "HTML" to "XML" to reflect
what it actually matches.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
When using `--stub` and picking up the value of the first "H1" tag
to use as the title, remove markup (such as links, italic, bold,
etc.) from the value before using it.
Since <title>...</title> value cannot contain links or other markup
this makes the displayed title look much better where such markup
is present in the original document.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
This works to hook up a fragment link to its section:
# Section 1
Link to [Top](#Section_1).
Make the same thing work when written like this:
# Section 1
Link to [Top][id].
[id]: #Section_1
Or even like this:
# Section 1
Link to [id].
[id]: #Section_1
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
A link reference may have the URL actually split onto the next line,
not just the title attribute.
Mention this in the syntax description for links.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Any absolute path URLs (but not // ones) have the prefix prepended.
If that makes the resulting URL a fully absolute URL it will not
be processed by any --htmlroot and/or --imageroot options.
With this option, site-relative absolute path URLs can be re-written
so that the site is made explicit in order to support viewing on
a different site.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
The "s" option of the --wiki format strips the final extension
before applying the template.
Enhance the "s" option to optionally take a list of extensions
and to only strip the extension if it's one from the list.
Provide a "shortcut" extension that represents all known markdown
extensions.
Change the default --wiki format to now be "%{s(:md)}.html" instead
of the previous default which means it will no longer strip arbitrary
extensions, but only known markdown ones.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
During initial processing, explict "block" tags are set aside to
avoid creating problems in the output later.
Adjust the matches to be case insensitive.
Also relax the extra-blank line before and after that only
prevents them being recognized where they need to be.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Add new `--base` option that allows a prefix to be specified to be
added to all bare fragment-only URL links.
Use of this option may be required in order for intra-document
fragment links to function properly within a document that makes
use of the `<base>` tag.
Make sure explicitly specified fragment-only URLs (i.e. given in
verbatim `<a>` tags) get hooked up to the proper destination if
possible.
They obviously are trying to refer to something in the same document
so make sure they get the same treatment to hook them up.
Do the same for fragment-only links inside wiki `[[`...`]]` links.
And for both of these (explicit `<a>` tags and `[[`...`]]` links)
make sure the new bare fragment-only URL prefix gets added if given.
While in there, adjust whitespace to match coding convention for
this file where needed in the sections that have been changed.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
When processing something like this:
* first
> quoted
* second
It's imperative that the list tags and blockquote tags do not
become intermingled resulting in invalid output like so:
<p><ul>
<li>first</p>
<blockquote>
<p>quoted</li>
<li>second</li>
</ul></p>
</blockquote>
Instead, process the lists together with code blocks and
blockquotes so that cannot happen and instead we get the
correct output like so:
<ul>
<li>first
<blockquote>
<p>quoted</p>
</blockquote>
</li>
<li>second</li>
</ul>
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Now that the recognition requirements for atx-style headers have
been tightened a bit, process them ahead of setext-style headers
to avoid having a horizontal rule immediately after one of the
atx-style headers being grabbed as a setext-style header.
Previously this:
##### My H5
---
Would end up as a single `<h2>##### My H5</h2>`. Now it will more
correctly become `<h5>My H5</h5><hr />` instead.
This has been a longstanding problem that goes at least as far
back as version 1.0.1 and probably further.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Treat more than 6 consecutive '#'s as not a header.
Allow blank headers to be recognized which can be used for
spacers and/or formatting breaks.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
In b62cef825e (Markdown.pl: recognize top-level lists better,
2017-01-09, markdown_1.1.0), an attempt was made to recognize
obvious lists that were improperly being treated as wrapped paragraphs.
While that change offers a number of improvements (i.e. more lists
are recognized properly than were before), it does not go far enough.
Further enhance it to only require a single list marker to start a
list provided it's one of the unordered list markers. While it's
certainly possible that a lone "*", "+" or "-" got wrapped onto the
beginning of a line by itself, it's easy to correct; since lone
occurrences of those characters seems highly unlikely, choose the
list starting interpretation instead.
In addition, if the prior line ends with a colon (:) do not require
two markers to start the list, just one.
Furthermore, allow a single, optional, blank line between two markers
that do start a list.
With these changes, the vast majority of lists are recognized properly.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Previously, to match setext-style headers, only the top
three levels of atx-style headers were hooked up with ids.
Change this and hook up h4, h5 and h6 atx-style headers using
the same rules as for the others.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
The language name specified for syntax highlighting generally has
to be composed of "word" characters (alphanumeric and "_") plus
"+", "-" and ".".
Allow a final trailing "#" on the language name so that c# and f#
can be used as language names.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
The $g_nested_brackets recursive regular expression is already being
used to match nested and balanced '['...']' sequences.
Introduce a $g_nested_parens recursive regular expression that
matches nested and balanced '('...')' sequences; use it to match
the parenthesized portion of `[...](...)` and `![...](...)` links.
This eliminates a number of previous issues with links that contained
embedded parentheses and non-reference image links nested within
non-reference non-image links.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
While a bit unusual, the reference id part of a reference link can
have nested '['...']' if wanted.
Therefore use $g_nested_brackets to match that side too.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Revise _DeTab yet again to provide a minor boost.
This really does currently seem to be the fastest detab mechanism
available.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>