markdown

Commit Graph

Author	SHA1	Message	Date
Kyle J. McKay	4174281293	Markdown.pl: remove markup from <title> value When using `--stub` and picking up the value of the first "H1" tag to use as the title, remove markup (such as links, italic, bold, etc.) from the value before using it. Since <title>...</title> value cannot contain links or other markup this makes the displayed title look much better where such markup is present in the original document. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	ea408f7d29	Markdown.pl: hook up fragment only link definitions This works to hook up a fragment link to its section: # Section 1 Link to [Top](#Section_1). Make the same thing work when written like this: # Section 1 Link to [Top][id]. [id]: #Section_1 Or even like this: # Section 1 Link to [id]. [id]: #Section_1 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	de1c7f4f1a	syntax.md: mention ability to split link references A link reference may have the URL actually split onto the next line, not just the title attribute. Mention this in the syntax description for links. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	fef2d21f4c	Markdown.pl: support new --absroot=prefix option Any absolute path URLs (but not // ones) have the prefix prepended. If that makes the resulting URL a fully absolute URL it will not be processed by any --htmlroot and/or --imageroot options. With this option, site-relative absolute path URLs can be re-written so that the site is made explicit in order to support viewing on a different site. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	9015f1dbd9	Markdown.pl: enhance --wiki "s" option operation The "s" option of the --wiki format strips the final extension before applying the template. Enhance the "s" option to optionally take a list of extensions and to only strip the extension if it's one from the list. Provide a "shortcut" extension that represents all known markdown extensions. Change the default --wiki format to now be "%{s(:md)}.html" instead of the previous default which means it will no longer strip arbitrary extensions, but only known markdown ones. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	eba379e9fb	Markdown.pl: next version is 1.1.10 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	f661334af8	Markdown version 1.1.9 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	8de6399fcc	Markdown.pl: blocks bonanza During initial processing, explict "block" tags are set aside to avoid creating problems in the output later. Adjust the matches to be case insensitive. Also relax the extra-blank line before and after that only prevents them being recognized where they need to be. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	2abca714d6	Markdown.pl: fun with fragments Add new `--base` option that allows a prefix to be specified to be added to all bare fragment-only URL links. Use of this option may be required in order for intra-document fragment links to function properly within a document that makes use of the `<base>` tag. Make sure explicitly specified fragment-only URLs (i.e. given in verbatim `<a>` tags) get hooked up to the proper destination if possible. They obviously are trying to refer to something in the same document so make sure they get the same treatment to hook them up. Do the same for fragment-only links inside wiki `[[`...`]]` links. And for both of these (explicit `<a>` tags and `[[`...`]]` links) make sure the new bare fragment-only URL prefix gets added if given. While in there, adjust whitespace to match coding convention for this file where needed in the sections that have been changed. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	09729dabf3	Markdown.pl: next version is 1.1.9 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	5d8bc32253	Markdown version 1.1.8 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	a125798c27	Markdown.pl: process lists, code blocks and blockquotes together When processing something like this: * first > quoted * second It's imperative that the list tags and blockquote tags do not become intermingled resulting in invalid output like so: <p><ul> <li>first</p> <blockquote> <p>quoted</li> <li>second</li> </ul></p> </blockquote> Instead, process the lists together with code blocks and blockquotes so that cannot happen and instead we get the correct output like so: <ul> <li>first <blockquote> <p>quoted</p> </blockquote> </li> <li>second</li> </ul> Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	e414dcb279	Markdown.pl: process atx-style before setext-style headers Now that the recognition requirements for atx-style headers have been tightened a bit, process them ahead of setext-style headers to avoid having a horizontal rule immediately after one of the atx-style headers being grabbed as a setext-style header. Previously this: ##### My H5 --- Would end up as a single `<h2>##### My H5</h2>`. Now it will more correctly become `<h5>My H5</h5><hr />` instead. This has been a longstanding problem that goes at least as far back as version 1.0.1 and probably further. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	4be5dfd8e3	Markdown.pl: improve atx-style header handling Treat more than 6 consecutive '#'s as not a header. Allow blank headers to be recognized which can be used for spacers and/or formatting breaks. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	ec42c098cb	Markdown.pl: add more list start heuristics In `b62cef825e` (Markdown.pl: recognize top-level lists better, 2017-01-09, markdown_1.1.0), an attempt was made to recognize obvious lists that were improperly being treated as wrapped paragraphs. While that change offers a number of improvements (i.e. more lists are recognized properly than were before), it does not go far enough. Further enhance it to only require a single list marker to start a list provided it's one of the unordered list markers. While it's certainly possible that a lone "*", "+" or "-" got wrapped onto the beginning of a line by itself, it's easy to correct; since lone occurrences of those characters seems highly unlikely, choose the list starting interpretation instead. In addition, if the prior line ends with a colon (:) do not require two markers to start the list, just one. Furthermore, allow a single, optional, blank line between two markers that do start a list. With these changes, the vast majority of lists are recognized properly. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	1ae11989d2	Markdown.pl: hook up all atx-style headers with ids Previously, to match setext-style headers, only the top three levels of atx-style headers were hooked up with ids. Change this and hook up h4, h5 and h6 atx-style headers using the same rules as for the others. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	a8309fb9a6	Markdown.pl: permit '#' suffix on backticks block language name The language name specified for syntax highlighting generally has to be composed of "word" characters (alphanumeric and "_") plus "+", "-" and ".". Allow a final trailing "#" on the language name so that c# and f# can be used as language names. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	ebfb2dafda	Markdown.pl: introduce $g_nested_parens and use it The $g_nested_brackets recursive regular expression is already being used to match nested and balanced '['...']' sequences. Introduce a $g_nested_parens recursive regular expression that matches nested and balanced '('...')' sequences; use it to match the parenthesized portion of `[...](...)` and `![...](...)` links. This eliminates a number of previous issues with links that contained embedded parentheses and non-reference image links nested within non-reference non-image links. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	1f046587d3	Markdown.pl: use $g_nested_brackets for both parts of reference links While a bit unusual, the reference id part of a reference link can have nested '['...']' if wanted. Therefore use $g_nested_brackets to match that side too. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	9b5b7d7d8a	Markdown.pl: add minor tweak to checkbox style in style sheet Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	9593c956c1	Markdown.pl: claw back another minor performance gain Revise _DeTab yet again to provide a minor boost. This really does currently seem to be the fastest detab mechanism available. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	47c670fe2f	Markdown.pl: use g_nested_brackets for images too The recursive regex `$g_nested_brackets` (see the source) correctly matches properly balanced nested `[`...`]` text. The normal anchor parsing already makes use of it. Use it for the image parsing too so that this: [![Alt](img.png)][link] [link]:example.txt Does not become the very broken: <a href="img.png"><img src="example.txt" alt="Alt</a>" /> But instead becomes the correct: <a href="example.txt"><img src="img.png" alt="Alt" /></a> This problem has been present all the way back to Markdown.pl version 1.0.1 (2004-12-14) and likely earlier too. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	b0ea7deb50	Markdown.pl: do not start a list with a single year or initial Avoid starting a new list when seeing only one single orphan list item that appears to be either a year or an initial (only UPPERCASE though). Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	71e7a56609	Markdown.pl: don't cuddle up nested lists too closely When list items are using paragraphs, make sure they do not cuddle up too closely to the surrounding closing "</li>" when they contain nested lists. Also include a "hint" for text-only browsers that super compact cuddling is not wanted at that spot. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	07ef050b87	Markdown.pl: carefully handle nested italics and bold Be very careful to make sure that this: open FILE* with write Does not turn into the broken: open <strong>FILE<em></strong> with write</em> But instead turns into the correct: open <strong>FILE</strong> with <em>write</em> Handle italics inside bold by nesting a callout that finishes up by "hiding" any leftover bold/italics markup characters. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	32758c23ee	Markdown.pl: find anchors harder Try very much harder to find a match for explicit `#anchorhere` links in the document. The implicit link shortcut works so much better and is far less error prone. Nevertheless, attempt to connect the stray anchor links via extraordinary means. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	b962406c4d	Markdown.pl: be careful with bold/italic/strikethrough in anchor text This, for example: [Looking at a](#something) is good* Must not produce this broken output: <a href="#something">Looking at a<em></a> is good</em> But instead this: <a href="#something">Looking at a</a> is <em>good</em> Achieve this by making a special pass to handle bold, italic and strikethrough on the anchor text and then "hiding" any remaining markup characters that might be confusingly matched up with characters outside the anchor text. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	06b8bccb12	Markdown.pl: ignore likely non-tag tags Certain start tags (a, area, img, map) do not make sense unless they have at least one attribute present. If a completely attribute barren start tag for one of these elements is found, treat it as invalid and escape the leading '<'. This is an heuristic that shouldn't cause too many problems while silently "correcting" incorrect input. Either way (leaving the bare start tag with no attributes or escaping it and potentially causing a fault as its end tag no longer has anything to match up with), it's broken. The question becomes then which breakage is more common in order to handle that one in preference to the other. With this change, the "it wasn't really a tag after all" situation will now be considered more common than the "it was deliberatly an invalid start tag with a matching end tag" situation. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	e0abe51cba	Markdown.pl: sanitize and validate more With `--sanitize`, minimized empty tags that should not be for XHTML (e.g. "<p/>") have been automatically split into separate start and end tags (e.g. "<p></p>"). Do the same in reverse for separate start and end tags that should not be for XHTML (e.g. "<hr></hr>") and turn them into a single minimized tag (e.g. "<hr />"). Additionally, when `--validate-xml-internal` is active, automatically insert omitted optional end tags in (hopefully) the right places. For example, "<ul><li>foo<li>bar</ul>" will automatically become "<ul><li>foo</li><li>bar</li></ul>" thus making it valid XHTML. When there are multiple errors reported (can only happen when there are multiple opening tags missing their ending tags), report the errors in reverse order (i.e. the first one reported will be the largest line number) because that will often identify the source of the trouble as the first error line due to the nature of tag nesting. Make a few related wordsmithing changes at the same time. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	5b7e41b7ae	Markdown.pl: add missing tag info Add missing info for tfoot and thead. Add "summary" to attributes for table. Add "col" to list of empty element tags. Add some explanatory comments to hash tables. Add more text to `--sanitize` help. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	db542706ff	Markdown.pl: --validate-xml-internal by default Enhance the sanitation process slightly in order to perform simple tag missing/mismatch validation. Almost all the work needed was already being performed with the exception of keeping a tag stack. Keep an active tag stack when `--validate-xml-internal` is active and use it to find mismatched and/or missing open/closing tags. Enable this by default unless `--no-sanitize` has been given (the sanitize machinery does the validation and is required) or `--validate-xml` or `--no-validate-xml` has been given explicitly. Unlike the more comprehensive `--validate-xml`, this option operates very quickly and does not require any additional XML modules to be present. It's also compatible with `--html4tags`. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	dd3636e207	Markdown.pl: claw back a tiny performance gain Avoid using Pod::Usage unless it's actually needed. Avoid using XML::Simple or XML::Parser without --validate-xml. Also correct the sense of the MT tests while in there. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	cfa6b427dc	Markdown.pl: introduce --raw mode With `--raw`, no actual Markdown processing takes place, but the input will still be sanitized (by default) and may optionally also have --html4tags or --validate-xml used on it too. The output's line endings will be normalized and the encoding converted to UTF-8. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	951005b131	Markdown.pl: defang detab The old tab expansion code exhibits very poor performance when passed the contents of entire files as input. Mitigate this inefficiency by operating on one line at a time. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	cc09acd8e1	Markdown.pl: perform full sanitation and sterilize the trash When `--sanitize` is active (the default), tags have been "sanitized" as they were encountered. Unfortunately, not all tags get "encountered" by the sanitation section. Pre-existing "block" tags in the input are squirreled away to prevent unintentional formatting "accidents". Such tags were evading the sanitation engineer. Instead of "sterlizing" when the tags are encountered during normal formatting processing, perform full sanitation sterlization (provided `--sanitize` is active) on the final, fully-formatted output. By waiting until the end, all tags will be fully sterilized (even those produced by Markdown.pl itself), no tag shall escape. If `--validate-xml` has been requested (it's off by default), that will happen _after_ full sanitation. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	1b55444333	Markdown.pl: do not ignore input file errors Rework the code that iterates through all the files given on the command line and make sure any errors are reported with a fatal result. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	a17ab1a3d8	Markdown.pl: support table row merging Given a table like this: \| H1 \| H2 \| H3 \| \|----\|----\|----\| \| x1 \| x2 \| x3 \| \| y1 \|too long\|y3\| \| z1 \| z2 \| z3 \| The problematic second row can now be split across mulitple lines like this: \| H1 \| H2 \| H3 \| \|----\|----\|----\| \| x1 \| x2 \| x3 \| \| y1 \|too \| y3 \|\ \| \|long\| \| \| z1 \| z2 \| z3 \| While the example is contrived, even with "sloppy" tables, having the ability to merge row data like this usually avoids the need for unsightly long lines when an exceptional cell overflows excessively. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	c0eb6e092d	Markdown.pl: use XML::Parser if XML::Simple is not present When `--validate-xml` has been given and XML::Simple does not appear to be present but XML::Parser does, use it instead. The output messages are slightly different on errors, but they're still clear and validation still happens. Even though XML::Parser is one of the possible backends for XML::Simple, either one can be present without the other. This change makes `--validate-xml` more widely available. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	f0865cbb53	Markdown.pl: correct tab expansion for shifted ``` blocks A "backticks-delimited code block" must be at the left margin, however, the entire block can be shifted 1, 2 or 3 characters to the right using spaces. These are all the same: ``` line ``` ``` line ``` ``` line ``` ``` line ``` When support for the slightly shifted code blocks was originally added, the effect on tab expansion was overlooked. The shifting in of the code block serves to make the raw markdown source perhaps a tiny bit more readable, but the shifting in is NOT a logical part of the lines in the code block themselves. Therefore, remove any "shift in" spaces before expanding tabs within the code block so that the result ends up looking exactly the same no matter whether the code block has been shifted in at all or not. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	52ae3f5aa7	Markdown.pl: offer to --validate-xml if requested Introduce a new `--validate-xml` (and corresponding `--no-validate-xml`) option that performs a simple XML validation of the output using `XML::Simple` and fails with an error if any problems are found. Because a new module (`XML::Simple`) is required to do this, it's NOT the default option (in which case `XML::Simple` need not be present). When `--validate-xml` is combined with `--sanitize` (which is the default) the output can be included in an XHTML page with confidence it will not invalidate the page's XML. Because the `--html4tags` option produces non-XML output, it is incompatible with `--validate-xml` and if both are given an immediate error will be reported. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	3ecc22500c	Markdown.pl: perform tag sanitation and take out the trash Unless the new `--no-sanitize` option is given, then inspect each raw tag encountered in the input and "sanitize" it before outputting it. The new `--sanitize` option is now the default. As before, any tags not on the "approved" list have their "<" escaped and become ordinary text in the output. With this commit, two new tags are now on the "approved" list: `<map>` and `<area>`. Each "approved" tag encountered has all of its attributes inspected and any that are not on the "approved" list for that tag are expunged. Any others that aren't in canonical form are corrected. Empty tags are normalized as well (e.g. "<p/>" becomes "<p></p>" but "<img ...>" becomes "<img ... />" etc.). With this change, Markdown.pl output should be reasonably "safe" for use as no code tags or attributes are permitted. While client-side image maps are now allowed, server-side are not (the "ismap" attribute will be silently expunged). Client-side image maps should always be preferred to server-side anyway, that's why they were created in the first place. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	afb45ff295	Markdown.pl: omit empty table header rows There's no point to including an empty header row in the output table. It looks ugly. While the header row is syntactically required in order to recognize the table markup, omit it from the output if all its column cells are empty ignoring any whitespace. This gives a more pleasing result. Additionally add an extra class tag "...-table-nohdr" in addition to the usual "...-table" tag to allow easy custom formatting of these headerless tables if desired. Empty body rows are always preserved because they can always be omitted without breaking the table syntax. Update the docs to describe the new behavior. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	a833262dd7	Markdown.pl: finesse backslash handling in tables Prevent loss of backslash-escaping in tables by avoiding an effective de-backslash operation that was incorrect. Now `\\\\` always ends up producing two backslashes whether it's in a table or not. Adjust the regexps so that the `\|` in `\\|` does not get mistaken for the final `\|` of a table row when that table row has omitted the final trailing `\|`. Also silently handle a lone `\` at the end of a table row by pretending it was doubled rather than breaking the table. While in that area of the code, remove a line that computed a value which was never used. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	990f74e52e	Markdown.pl: support [[wiki style links]] with --wiki The new `--wiki` option (with optional argument) specifies how to transform [[wiki style links]] into URLs. There are a veritable plethora of options available to affect the transformation. Absolute URL wiki style links continue to be recognized even without the `--wiki` option. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	19766850b9	Markdown.pl: tidy up bare-bones wiki code Move all the special cases to the top of the _ProcessWikiLink function. Add an exception to handle a fragment-only location. All unhandled wiki style links now fall out the bottom of the function. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	149f4d6308	Markdown.pl: eliminate egregious URL snafus The standard explicitly prohibits nesting of "a" tags. Prevent nesting from occuring when only "markdown" input is present. If explicit "<a ...>...</a>" tags are present in the input they will be (mostly) left alone even if they've been incorrectly nested. Avoid producing mojibake links in the case that the URL itself appears to contain another URL. (The wayback machine links often look like this.) In essence, once a link (either an "a" or "img") tag has been generated/processed, avoid processing it again in order to make sure no accidental mojibake occurs. One consequence of this change is that "Automatic Links" that are NOT surrounded by '<' and '>' will now only be recognized if they occur at the beginning of the input or after a whitespace (a newline qualifies) character. This also helps to eliminate unintended double linkification. Furthermore, URLs containing peculiar characters in them (e.g. single quote and/or double quote) should be far less troublesome now as well. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	68fcc9e237	Markdown.pl: remove control characters first Before doing any other processing, eliminate any control characters that might be present. There should not be any, but just in case. The only "control" characters kept are ASCII codes 9, 10, 12 and 13 (tab, nl, ff, cr). Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	4f58fa33bc	Markdown.pl: avoid use of ' Although ' is specified as part of the XML standard, some older end user clients may not actually recognize it. Use ' instead of ' to avoid any difficulty. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	505d70a6e7	Markdown.pl: allow documented single-quote ref titles The documentation claims that all of these work: [1]: url1.txt "title 1" [2]: url2.txt 'title 2' [3]: url3.txt (title 3) However, the single-quote version was not being accepted. Update the regular expression to grok the single-quote variant and require the correct matching closing quote in order to match. This corrects the invalid acceptance of mismatched quote titles such as "title) and (title". Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	4de8c983f1	Markdown.pl: avoid list nesting confusion With this source: * A * B C D * E * F G * H I The parser was getting confused about where each unordered list actually ended when nesting the processed inner list inside the outer list. Address this by: 1) Giving "```" style code blocks their own hash just in case to make sure they can never collide with html blocks. 2) Temporarily (the indentation gets removed before final output) indent nested list tags by the current list nesting level to ensure there's no confusion about where each list/sublist starts and ends. 3) Make sure the list closing tag is followed by two newlines rather than one to avoid potentially not applying markup to the immediately following line. Since a few patterns had minor adjustments for these changes, those patterns also had a few unnecessarily capturing groups changed to non-capturing groups in the hope that some miniscule performance gain can be squeezed out with the change. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago

1 2 3

135 Commits (417428129368309d97aa6d3f48553bde787072d0) All Branches Search

135 Commits (417428129368309d97aa6d3f48553bde787072d0)

All Branches