markdown

Commit Graph

Author	SHA1	Message	Date
Kyle J. McKay	db542706ff	Markdown.pl: --validate-xml-internal by default Enhance the sanitation process slightly in order to perform simple tag missing/mismatch validation. Almost all the work needed was already being performed with the exception of keeping a tag stack. Keep an active tag stack when `--validate-xml-internal` is active and use it to find mismatched and/or missing open/closing tags. Enable this by default unless `--no-sanitize` has been given (the sanitize machinery does the validation and is required) or `--validate-xml` or `--no-validate-xml` has been given explicitly. Unlike the more comprehensive `--validate-xml`, this option operates very quickly and does not require any additional XML modules to be present. It's also compatible with `--html4tags`. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	dd3636e207	Markdown.pl: claw back a tiny performance gain Avoid using Pod::Usage unless it's actually needed. Avoid using XML::Simple or XML::Parser without --validate-xml. Also correct the sense of the MT tests while in there. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	cfa6b427dc	Markdown.pl: introduce --raw mode With `--raw`, no actual Markdown processing takes place, but the input will still be sanitized (by default) and may optionally also have --html4tags or --validate-xml used on it too. The output's line endings will be normalized and the encoding converted to UTF-8. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	951005b131	Markdown.pl: defang detab The old tab expansion code exhibits very poor performance when passed the contents of entire files as input. Mitigate this inefficiency by operating on one line at a time. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	cc09acd8e1	Markdown.pl: perform full sanitation and sterilize the trash When `--sanitize` is active (the default), tags have been "sanitized" as they were encountered. Unfortunately, not all tags get "encountered" by the sanitation section. Pre-existing "block" tags in the input are squirreled away to prevent unintentional formatting "accidents". Such tags were evading the sanitation engineer. Instead of "sterlizing" when the tags are encountered during normal formatting processing, perform full sanitation sterlization (provided `--sanitize` is active) on the final, fully-formatted output. By waiting until the end, all tags will be fully sterilized (even those produced by Markdown.pl itself), no tag shall escape. If `--validate-xml` has been requested (it's off by default), that will happen _after_ full sanitation. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	1b55444333	Markdown.pl: do not ignore input file errors Rework the code that iterates through all the files given on the command line and make sure any errors are reported with a fatal result. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	a17ab1a3d8	Markdown.pl: support table row merging Given a table like this: \| H1 \| H2 \| H3 \| \|----\|----\|----\| \| x1 \| x2 \| x3 \| \| y1 \|too long\|y3\| \| z1 \| z2 \| z3 \| The problematic second row can now be split across mulitple lines like this: \| H1 \| H2 \| H3 \| \|----\|----\|----\| \| x1 \| x2 \| x3 \| \| y1 \|too \| y3 \|\ \| \|long\| \| \| z1 \| z2 \| z3 \| While the example is contrived, even with "sloppy" tables, having the ability to merge row data like this usually avoids the need for unsightly long lines when an exceptional cell overflows excessively. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	c0eb6e092d	Markdown.pl: use XML::Parser if XML::Simple is not present When `--validate-xml` has been given and XML::Simple does not appear to be present but XML::Parser does, use it instead. The output messages are slightly different on errors, but they're still clear and validation still happens. Even though XML::Parser is one of the possible backends for XML::Simple, either one can be present without the other. This change makes `--validate-xml` more widely available. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	f0865cbb53	Markdown.pl: correct tab expansion for shifted ``` blocks A "backticks-delimited code block" must be at the left margin, however, the entire block can be shifted 1, 2 or 3 characters to the right using spaces. These are all the same: ``` line ``` ``` line ``` ``` line ``` ``` line ``` When support for the slightly shifted code blocks was originally added, the effect on tab expansion was overlooked. The shifting in of the code block serves to make the raw markdown source perhaps a tiny bit more readable, but the shifting in is NOT a logical part of the lines in the code block themselves. Therefore, remove any "shift in" spaces before expanding tabs within the code block so that the result ends up looking exactly the same no matter whether the code block has been shifted in at all or not. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	52ae3f5aa7	Markdown.pl: offer to --validate-xml if requested Introduce a new `--validate-xml` (and corresponding `--no-validate-xml`) option that performs a simple XML validation of the output using `XML::Simple` and fails with an error if any problems are found. Because a new module (`XML::Simple`) is required to do this, it's NOT the default option (in which case `XML::Simple` need not be present). When `--validate-xml` is combined with `--sanitize` (which is the default) the output can be included in an XHTML page with confidence it will not invalidate the page's XML. Because the `--html4tags` option produces non-XML output, it is incompatible with `--validate-xml` and if both are given an immediate error will be reported. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	3ecc22500c	Markdown.pl: perform tag sanitation and take out the trash Unless the new `--no-sanitize` option is given, then inspect each raw tag encountered in the input and "sanitize" it before outputting it. The new `--sanitize` option is now the default. As before, any tags not on the "approved" list have their "<" escaped and become ordinary text in the output. With this commit, two new tags are now on the "approved" list: `<map>` and `<area>`. Each "approved" tag encountered has all of its attributes inspected and any that are not on the "approved" list for that tag are expunged. Any others that aren't in canonical form are corrected. Empty tags are normalized as well (e.g. "<p/>" becomes "<p></p>" but "<img ...>" becomes "<img ... />" etc.). With this change, Markdown.pl output should be reasonably "safe" for use as no code tags or attributes are permitted. While client-side image maps are now allowed, server-side are not (the "ismap" attribute will be silently expunged). Client-side image maps should always be preferred to server-side anyway, that's why they were created in the first place. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	afb45ff295	Markdown.pl: omit empty table header rows There's no point to including an empty header row in the output table. It looks ugly. While the header row is syntactically required in order to recognize the table markup, omit it from the output if all its column cells are empty ignoring any whitespace. This gives a more pleasing result. Additionally add an extra class tag "...-table-nohdr" in addition to the usual "...-table" tag to allow easy custom formatting of these headerless tables if desired. Empty body rows are always preserved because they can always be omitted without breaking the table syntax. Update the docs to describe the new behavior. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	a833262dd7	Markdown.pl: finesse backslash handling in tables Prevent loss of backslash-escaping in tables by avoiding an effective de-backslash operation that was incorrect. Now `\\\\` always ends up producing two backslashes whether it's in a table or not. Adjust the regexps so that the `\|` in `\\|` does not get mistaken for the final `\|` of a table row when that table row has omitted the final trailing `\|`. Also silently handle a lone `\` at the end of a table row by pretending it was doubled rather than breaking the table. While in that area of the code, remove a line that computed a value which was never used. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	990f74e52e	Markdown.pl: support [[wiki style links]] with --wiki The new `--wiki` option (with optional argument) specifies how to transform [[wiki style links]] into URLs. There are a veritable plethora of options available to affect the transformation. Absolute URL wiki style links continue to be recognized even without the `--wiki` option. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	19766850b9	Markdown.pl: tidy up bare-bones wiki code Move all the special cases to the top of the _ProcessWikiLink function. Add an exception to handle a fragment-only location. All unhandled wiki style links now fall out the bottom of the function. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	149f4d6308	Markdown.pl: eliminate egregious URL snafus The standard explicitly prohibits nesting of "a" tags. Prevent nesting from occuring when only "markdown" input is present. If explicit "<a ...>...</a>" tags are present in the input they will be (mostly) left alone even if they've been incorrectly nested. Avoid producing mojibake links in the case that the URL itself appears to contain another URL. (The wayback machine links often look like this.) In essence, once a link (either an "a" or "img") tag has been generated/processed, avoid processing it again in order to make sure no accidental mojibake occurs. One consequence of this change is that "Automatic Links" that are NOT surrounded by '<' and '>' will now only be recognized if they occur at the beginning of the input or after a whitespace (a newline qualifies) character. This also helps to eliminate unintended double linkification. Furthermore, URLs containing peculiar characters in them (e.g. single quote and/or double quote) should be far less troublesome now as well. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	68fcc9e237	Markdown.pl: remove control characters first Before doing any other processing, eliminate any control characters that might be present. There should not be any, but just in case. The only "control" characters kept are ASCII codes 9, 10, 12 and 13 (tab, nl, ff, cr). Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	4f58fa33bc	Markdown.pl: avoid use of ' Although ' is specified as part of the XML standard, some older end user clients may not actually recognize it. Use ' instead of ' to avoid any difficulty. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	505d70a6e7	Markdown.pl: allow documented single-quote ref titles The documentation claims that all of these work: [1]: url1.txt "title 1" [2]: url2.txt 'title 2' [3]: url3.txt (title 3) However, the single-quote version was not being accepted. Update the regular expression to grok the single-quote variant and require the correct matching closing quote in order to match. This corrects the invalid acceptance of mismatched quote titles such as "title) and (title". Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	4de8c983f1	Markdown.pl: avoid list nesting confusion With this source: * A * B C D * E * F G * H I The parser was getting confused about where each unordered list actually ended when nesting the processed inner list inside the outer list. Address this by: 1) Giving "```" style code blocks their own hash just in case to make sure they can never collide with html blocks. 2) Temporarily (the indentation gets removed before final output) indent nested list tags by the current list nesting level to ensure there's no confusion about where each list/sublist starts and ends. 3) Make sure the list closing tag is followed by two newlines rather than one to avoid potentially not applying markup to the immediately following line. Since a few patterns had minor adjustments for these changes, those patterns also had a few unnecessarily capturing groups changed to non-capturing groups in the hope that some miniscule performance gain can be squeezed out with the change. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	7a1371277e	Markdown.pl: allow empty first blockquote line Previously this: > This should be a block quote now Would not get recognized as a blockquote, now it will. Previously the lone ">" got left on the line all by itself as though it had been escaped. Clearly that was improper. Alternatively, it could have been picked up as its very own empty blockquote, but that seems like the less desirable resolution for the issue. A ">" at the beginning of a line always signals the beginning of a blockquote (unless it's escaped) and that blockquote then continues on until it encounters a blank line. Therefore the new interpretation must be correct, the old interpretation was clearly wrong and the "empty blockquote of its own" interpretation is also clearly wrong since it's not immediately followed by a blank line. But this: > This will not be a block quote With the blank line inserted, the above ">" does really now end up correctly in an "empty blockquote of its own". Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	258c5fa653	Markdown: enhance image URL support When determining whether or not to add the "--imageroot" or "--htmlroot" prefix to a relative link, ignore any query string that may be present. The fragment (if present) was already being ignored. Allow URLs given in reference lines to be wrapped like so: [1]: data:image/gif;base64,R0lGODlhFwAXAPMAMf///+7u7t3d3czMzLu7u6qqqp\ mZmYiIiHd3d2ZmZlVVVURERDMzMyIiIhEREQAAACwAAAAAFwAXAAAExxDISau9Mg\ She8DURhhHWRLDB26FkSjKqxxFqlbBWOwF4fOGgsCycRkInI+ocEAQNBNWq0caCJ\ i9aSqqGwwIL4MAsRATeMMMEykYHBLIt7DNHETrAPrBihVwDAh2ansBXygaAj5sa1\ x7iTUAKomEBU53B0hGVoVMTleEg0hkCD0DJAhwAlVcQT6nLwgHR1liUQNaqgkMDT\ NWXWkSbS6lZ0eKTUIWuTSbGzlNlkS3LSYksjtPK6YJCzEwNMAgbT9nKBwg6Onq6B\ EAOw== "title (100x100)" This facilitates embedding small amounts of data directly in the source without causing too much difficulty. Update the documentation to describe this new feature. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	c86fea4089	Markdown: enhance link handling Allow links of the form [...](#...) to find themselves on the page the same way links of the form [...] can. Be flexible accepting either '-' or '_' in place of spaces in the heading name since fragment names may not contain spaces. Refactor the code that manufactures "img" and "a" tags to both simplify the code and make sure that all href, src, alt and title attributes are fully and properly "escaped". In addition, if the "title" for an image ends with something that looks like "(512x342)", "(?x342)" or "(512x?)" then strip that out of the title and set the appropriate width and height attributes on the manufactured "img" tag. For example something like this: ![Nice pic](pic.jpg "Nice (500x300)") or this: ![Nice pic][1] [1]: <pic.jpg> "Nice (500x300)" now produces this: <img src="pic.jpg" alt="Nice pic" width="500" height="300" title="Nice" /> Update the syntax doc to mention these additions. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	99e578ef88	Markdown.pl: next version is 1.1.8 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	5 years ago
Kyle J. McKay	bcfa57ebb0	Markdown version 1.1.7 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	ac94bf3bbf	Markdown.pl: _PrefixURL more intelligently Instead of turning an empty URL into an href="" attribute that effectively does nothing, change it into an href="#" attribute that creates a link to the current page. When adding a relative/image prefix leave fragment-only links unmolested. They are meant to link somewhere on the current page and must not be changed. When inspecting the destination to determine whether to use the -i prefix instead of the -r prefix when both are given, ignore any trailing fragment. Fragments don't really make sense on image links and should never actually be sent to the server anyway by a behaving client, but match them properly in any case. Also make sure that URLs only get a prefix added at most once. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	d943c07ae6	Markdown.pl: next version is 1.1.7 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	e5cffb587c	Markdown version 1.1.6 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	f2701d5638	Markdown.pl: apply -i and -r options to a and img tags When a and img tags are generated using the normal Markdown syntax any prefixes specified with the -i and -r options are inserted as appropriate. Extend this processing to explicit a and img tags as well. This makes sense because they should be handled the same way the Markdown syntax generated tags are for consistency. It's still possible to "escape" from the prefixes by using an explicit scheme+host+port or the commonly supported (but not a standard) //+host+port mechanism. And it only matters if prefixes have been set with the -i and/or -r options (the default is no prefixes) anyway. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	66d8157a89	Markdown.pl: correct .svg extension matching rule The .svg/.svgz matching rule was matching .svz and .svgz by mistake. Move the wayward '?' to the end so it matches .svg and .svgz as originally intended. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	d87f4abdb1	Markdown.pl: improve XML comment parsing Use the actual XML comment rule for parsing XML comments. The leading delimiter is fixed as "<!--" and the trailing delimiter is fixed as "-->". In between the leading and trailing delimiters any characters other than a "-" may be used and a "-" may be used provided it's followed immediately with a non-"-" character. Now that the clear beginning and end of comments can be properly identified, there no longer needs to be a blank line following the comment -- the end delimiter serves quite unambiguously. Relax the ending match to just be end of line or end of document. This makes comments parse much more like they're expected to. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	434dfdb6bd	Markdown.pl: be more flexible parsing backticks-delimited code blocks Allow leading spaces before the backticks delimiters on the starting and ending lines (up to one less than the indent width). Then remove upto that number of leading spaces (based on the starting backticks delimiter line) from each of the lines in the code block itself. This better matches how lax some other formatters are with backticks- delimited code blocks parsing. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	f500ed7b64	Markdown.pl: next version is 1.1.6 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	d6e40e8047	LICENSE: NO WARRANTY must be CONSPICUOUS Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	f5adef77ca	Markdown version 1.1.5 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	aeff074060	Markdown.pl: make sure all alt= and title= text is escaped Markup is not allowed inside attributes. Make sure that everything that ends up in alt="..." and title="..." has be properly escaped to prevent it from acquiring markup during later processing phases. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	2b798a8841	Markdown.pl: support tables Add support for basic tables. Nested tables are not supported although tables themselves can appear within lists and blockquotes and do work properly there. The commonly used table syntax is recognized including the left/right/center alignment indicators. Inline markup within each column also works just fine. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	95c520b3d1	Markdown.pl: next version is 1.1.5 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	2a1ac6ae19	README: quote tag in comment to avoid misinterpretation Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	7 years ago
Kyle J. McKay	975a2c951c	Markdown version 1.1.4 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	8 years ago
Kyle J. McKay	ff486f30f1	Markdown.pl: disallow <dir> and <menu> without --deprecated When dealing with program arguments "<dir>" is highly problematic. Both "<dir>" and "<menu>" have long been deprecated and there are other tags readily available for making similar lists that are not deprecated (and do not require use of style sheets either). Therefore treat "<dir>" and "<menu>" as literal text unless the new "--deprecated" option is used. Other "deprecated" tags continue to be recognized and passed through as they generally do not have non-deprecated equivalents that do not also require use of style attributes or style sheets in some fashion. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	8 years ago
Kyle J. McKay	cb994b5494	Markdown.pl: next version is 1.1.4 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	8 years ago
Kyle J. McKay	23dc03ee6f	Markdown.pl: keep valid tag list alphabetized While trying to keep all the various table-related tags together is admirable, it makes it hard to be sure the tag is in the list or not (an also looks bad compared to the other tags). Therefore put the table-related tags into alphabetical order just like the rest of them. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	8 years ago
Kyle J. McKay	89cf67e880	Markdown version 1.1.3 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	8 years ago
Kyle J. McKay	97718e33be	Markdown.pl: auto escape '<' of non-tags Automatically encode the leading '<' of non-html tag names so they do not confuse the HTML parser or produce invalid HTML output. This requires embedding a list of known HTML tags (a list of over 50 is now included). This will also cause some "unsafe" tags that were previously being passed through to be escaped (such as "script", "style", "object", "embed" etc.). Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	8 years ago
Kyle J. McKay	f07bdd3bc0	Markdown.pl: escape '<' of impossible tags Automatically escape a '<' that introduces an impossible HTML tag. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	8 years ago
Kyle J. McKay	b5e5efa063	Markdown.pl: do not overlook sibling list items When support for additional list markers was added in `51f3d63833` (Markdown.pl: support more list markers, 2017-01-10, 1.1.0), a bug was inadvertently introduced that could cause adjacent sibling list items to only recognize the first as a list item and as a side-effect prevent markup from being recognized in the second. The problem occurred when the matching pattern was split to run in progressive matching mode and resulted in the sibling list items match not always being matched by the progressive list item pattern (extra possible \n's were preventing a match). Fix this by adding a '+' in the correct location in the progressive pattern. The side-effect was caused because any "leftover" (of which there shouldn't be any) was not being processed for markup. As a precaution, run any leftover through the block gamut markup processor just in case even though there should never be any leftover. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	8 years ago
Kyle J. McKay	51d531a90a	Markdown version 1.1.2 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	8 years ago
Kyle J. McKay	d0326d6abd	Markdown.pl: usually (i), (v) and (x) are roman Only treat (i), (v) and (x) as alpha if the previous list marker was lower alpha (or upper alpha in the case of (I), (V) and (X)). Previously they were treated as alpha if the first marker in the list was alpha, but if list marker types were changed mid-list that could lead to unexpected behavior. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	8 years ago
Kyle J. McKay	dfbf2b4e30	Markdown.pl: retain square brackets around footnotes If the document contains footnote style links (e.g. [1], [2], [3] ...) they look much better formatted so as to retain the square brackets in the link text. Do this for any footnote style link text consisting of one to three digits. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	8 years ago

1 2 3 4 5

205 Commits (658edb6abf8a45ab405741306dbc3b0c70df784c) All Branches Search

205 Commits (658edb6abf8a45ab405741306dbc3b0c70df784c)

All Branches