markdown

Commit Graph

Author	SHA1	Message	Date
Kyle J. McKay	c154f45386	Markdown: allow backticks-delimited code blocks in lists Using backticks-delimited code blocks in lists has apparently become rather widely used even though the original specification didn't seem to allow them within lists. Make the needed changes to allow this. The changes are actually rather minor. And do update the syntax document to reflect this change. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	f2f8a1e2fe	Markdown: support `\` EOL to generate a `<br />` Improve compatibility with some other markdown renderers and translate a backslash (`\`) at the very end of a line (a line that is not inside a table or code block) into a `<br />` in addition to the two-or-more-spaces at the end of line translation that already takes place and does the same thing. As expected, the backslash can be escaped by doubling it to preserve it (or by enclosing it in backquotes `\` to make it a code span). Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	d9f6613164	syntax.md: mention headers supersede horizontal rules Add a note to the syntax document mentioning that when using a line of solid hyphens (`-`s) for a horizontal rule, it will actually be treated as an H2 Setext-style header if the preceding line is not blank. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	94c8e6e7dd	Markdown.pl: wordsmith a help comment Clean up a little bit of awkward english in the help. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	9c7c5a0c11	Markdown.pl: do not mistake table for code block When making a nice looking table such as this: Term \| Detail -------------- \| -------------- First term \| number one Second term \| number two There is a potential to misinterpret the header line as the beginning of a code block (the indented type) since it begins with 5 spaces. Of course this could be addressed either by moving the "Term" string to the left at least 2 spaces or by adding the optional leading "\|" to the beginning of the column, but that's unnecessarily ugly. Instead, when parsing a code block, check to see if the code block consists of exactly one line and when combined with the next line represents a valid table start. A valid table start specifies a header row and a separator row with exactly the same (positive integer) number of columns. If a valid table start is found, avoid making it into a code block and instead allow the table code to grab it and make it into a table. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	aa05222a09	Markdown.pl: correct minor quibble in DoCodeBlocks regex Correct a longstanding issue with the regex used when matching code blocks. Specifically the 4-spaces-indented kind of code block. The code block ends either at either the end of the document or when a non-indented line is encountered. The pattern looking for the non-indented line actually allowed a match with up to the full 4-space indentation. It hasn't been a problem because the greedy matcher before that part of the pattern grabs any lines with 4 or more spaces of indentation. However, leaving the pattern as-is leaves it more ambiguous than necessary and leaves open more backtracking possibilities (although in this case the greedy matcher should prevent them being used). Correct the pattern to reflect the actual syntax and make that part of the pattern non-capturing to make the compiled pattern just that little bit slightly more efficient. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	97d6ad49e2	README: backquote some text in changes list for 1.1.12 The README file is meant to be valid Markdown. The changes listed for version 1.1.12 included: do not choke on <br></br> etc. Of course, that caused a "<br />" tag to end up in the rendered output. Quote the tags with backquotes like so: do not choke on `<br></br>` etc. This makes it render as intended in the xhtml output. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	7fce1577a8	Markdown.pl: next version is 1.1.13 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	5ebdc50649	Markdown version 1.1.12 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	53b4a58143	Markdown.pl: add missing space to implied attributes When sanitizing an attribute with an implied value such as "compact" or "checked", add the required space at the end to avoid mashing up against any other attribute that might be present. For example, <ol compact start=10> now becomes the correct: <ol compact="compact" start="10"> rather than the previously incorrect: <ol compact="compact"start="10"> Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	39e875e4f5	Markdown.pl: allow target="_blank" rel="nofollow" While other targets could, potentially, represent legitimate issues for concern, opening a new window generally does not since that's typically a readily available option in the user agent anyway when choosing to follow any individual link. While using target="_blank" does not really represent any security issue, it may be an annoyance issue, but that's something for the author to address, not the sanitizer. Although rel="nofollow" is _not_ part of the HTML 4 standard, it may be very useful to avoid "endorsing" sites that are being linked to. Since it does not introduce any risk of scripting issues or other hidden issues, go ahead and allow it too. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	6efa98325a	Markdown.pl: add limited tilde-delimited code block support To avoid conflicting (too much) with setext-style H3 headers that are delimited with a line of tildes, require exactly three tildes to introduce a tilde-delimited code block. And, while in there, clean up the backticks-delimited code blocks pattern a tiny amount and allow either kind of code block to be closed by more than the number of opening delimiters in addition to exactly the same number of opening delimiters. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	35c983c9f0	Markdown.pl: do not choke on <br></br> etc. Adjust code to properly handle "empty" tags that are written as an open plus closing tag but do not contain any whitespace in the opening tag. The code already properly handles turning <hr noshade></hr> into just <hr noshade="noshade" />, but it was failing to handle that when the opening tag did not contain any whitespace such as <br></br>. Adjust the code to return the proper value for the opening tag under such a condition so that it's handled properly. Previously a sequence such as <br></br> would fail as it would end up being turned into <br /></br> which then fails XML validation. Now it works properly and turns <br></br> into <br /> as it should have been doing all along. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	658edb6abf	Markdown.pl: clean up closing tag whitespace While closing tags are matched okay if they contain whitespace, that whitespace was not being cleaned up in a comparable way to the manner in which whitespace in an opening tag is being handled. Make whitespace in closing tags be handled the same way as whitespace in opening tags. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	c578bbfcfa	Markdown.pl: add more comment stripping options With --strip-comments-lax even strictly invalid XML comments will be stripped. With --strip-comments-lax-only only strictly invalid XML comments will be stripped. Allowing strictly invalid XML comments to pass through to the output would produce invalid XML. By default such invalid comments end up having their leading '<' escaped so that they become plain text in the output thereby avoiding making it invalid XML. However, if comments are being stripped out, there's no reason the standard cannot be relaxed a little bit since the output will remain valid XML as the comments will not be passed through to the output in that case. The two new options, --strip-comments-lax and --strip-comments-lax-only provide a choice of behavior, strip all comments including the strictly invalid ones, or just strip the strictly invalid ones. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	50422d1e28	Markdown.pl: better sanitization of href and src attributes Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	19c0131f03	Markdown.pl: do not choke on \n inside attribute values A tag such as this: <span style=" lots: of; stuff: in; here: now; "></span> Is perfectly valid. Add the missing "s" pattern match qualifier to make sure such attribute values do not end up getting mangled. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	86606c5a52	Markdown.pl: allow some overlooked table attributes For %cellhalign allow the overlooked 'char' and 'charoff' attributes. For table allow the overlooked 'frame' and 'rules' attributes. For table, tr, th and td allow the 'bgcolor' attribute. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	fde2382058	Markdown.pl: next version is 1.1.12 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	5ffe21ab63	Markdown version 1.1.11 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	6a118b8c53	Markdown.pl: format --help output properly Make the full help output use the correct formatting if available so it looks as nice as possible. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	62382f4e1b	Markdown.pl: add a help comment about literal html tags Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	cc044905ae	Markdown.pl: include yaml table lines for error msg lineno The line number mentioned in any error message gets generated by counting from the beginning of the non-yaml output. Of course, the final output will include any yaml table if generated. Adjust the line number in any error messages by the number of lines of preceding yaml table that will be included in the output. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	d05afd2cfb	Markdown.pl: convert named character entities by default Unless the new, heavily discouraged, `--keep-named-character-entities` option has been given, always convert known named character entities to their equivalent numerical entity. All strict XML validators will complain about anything other than the required-by-XML five entities (& < > " ') unless an entity dictionary has been provided. In addition, some older XHTML clients do not grok the ' entity. Now only the universally supported four entities (& < > ") will be preserved by default. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	4e9eba45fa	Markdown.pl: add a --div option and corresponding API It can be very convenient to be able to wrap the contents in its own output "<div>". Add an option to do that with an underlying corresponding API option to match. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	53494a4bdc	Markdown API: eliminate problematic xmlcheck == 1 There was absolutely no benefit to passing in an xmlcheck value of 1 to the Markdown/ProcessRaw API. It was ignored and did NOT result in any checking. Change this so that any value other than a numeric 0 results in XML checking when calling the API. This makes the most sense and avoids creating obscure API bugs. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	7ce25f1ec9	Markdown.pl: format -h output properly Make the usage output use the correct termcap codes to look nice if they're available. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	38a41b7a8c	Markdown.pl: process "br" indicators at end of paragraph Normally there's no point to a "<br />" tag at the end of a paragraph as the end of the paragraph will force a break anyway. Unless that "br" tag contains a "clear='...'" attribute. Make sure that 3 or more spaces at the end of a paragraph actually turns into a "<br clear='all' />" tag but at the same time make 2 spaces at the end of a paragraph just go away as it serves no purpose. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	a9781245b3	Markdown.pl: update help description Add missing conjunction. Update example of document that fails with --raw-html but not --raw-xml. With the recent changes, the old example no longer fails. Use a different example that still fails. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	94e07af1e7	Markdown.pl: strip markup out of implicit anchors Each H1, H2, ... H6 generated courtesy of markdown markup has an implicit anchor assigned based on the content of the element. For example: # This is an _H1_ header Strip any inline markup (in this case the '_'s) out before creating the implicit anchor. With this change, the text used to generate the anchor for the above is just "This is an H1 header". There are a couple of additional places where text that might have inline markup gets turned into an identifier (implicit reference links such as [thing][] or [thing] and wiki links without an explicit link destination such as [[thing]]). Perform the same tag stripping for them too before trying to find the destination. Many links that should have connected previously now do. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	857a411dc5	Markdown.pl: allow "stuff" on end of ``` line Some @#%^@! are doing something like this: ```shell script blah blah blah ``` That was not previously matching because only one optional "word" was allowed trailing the opening "```" characters. The single optional "word" is supposed to be a file extension type. Clearly ".shell script" is _not_ a file extension! Relax the rule somewhat. Multiple "words" are now allowed but only the first will ever participate in choosing the syntax highlighting (which currently never happens anyway). Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	89cae62dd1	Markdown.pl: do not sequester top-level unmatched p When running _HashHTMLBlocks, there's a step where we "match any empty block tags that should have been paired." Exclude "p" from that list. Given a document like this: <p> text That isolated "p" was getting sequestered away into its own blob resulting in an output document like this: <p> </p><p>text</p> By removing "p" from the list of "empty block tags that should have been paired," we get this output instead: <p> text</p> A nice improvement. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	f9a023d56a	Markdown.pl: remove td, th, tr from thead and tfoot closers Although "thead" and "tfoot" do, indeed, have an optional closing tag, neither "td", "th" nor "tr" will auto-close them. Therefore remove "thead" and "tfoot" from the list of tags that "td", "th" and "tr" will auto-close. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	c06b59644b	Markdown.pl: add bdo to taga1p The "bdo" (Bi-Directional Override) container element always requires at least one attribute to be present for it to be valid. Specifically, in this case, the "dir" attribute. Add "bdo" to the `%taga1p` (TAGs requiring Attributes count of 1 Plus) hash to reflect this. A bare "<bdo>" will now be passed through to the output as "<bdo>" when using the default options. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	845104c13a	Markdown.pl: improve handling of auto-closed p tags Given an input document like this: <div> <p> <pre>hi</pre> </p> </div> It will validate just fine in `--raw-xml` mode. However, in normal "html/xhtml" mode, the "pre" opening tag automatically closes the currently open "p" tag leading to this: <div> <p> </p><pre>hi</pre> </p> </div> Without further intervention, the closing "p" tag that was already there (just before the closing "div" tag), now has no matching open "p" tag to close anymore -- the corresponding open tag is now the open "div" section. Obviously the document fails to validate at this point. The naive fix simply has the closing tag that corresponds to the opening tag that caused the "p" to be auto-closed to then automatically re-open a "p" at that point producing this: <div> <p> </p><pre>hi</pre><p> </p> </div> While such a solution does work, it frequently ends up introducing extra unwanted "p" sections. Instead of reopening the "p" immediately upon seeing the closing tag that matches the opening tag that auto-closed the "p", simply set a "reopen p" flag. When the "reopen p" flag is set and suitable conditions are met, then go ahead and "reopen" a new "p" tag. The exact conditions are a bit of an heuristic at the moment but amount to clearing the "reopen p" flag when the next start tag is seen and inserting a new "p" at that time only if the open tag is a text level element opening tag. Alternatively, if the "reopen p" flag is currently set and some non-whitespace text shows up before seeing another open tag, re-open a new "p" at that point (and clear the "reopen p" flag). Finally, if the flag is currently set and a closing "p" tag appears, just discard it and clear the "reopen p" flag. Essentially this case has the effect of just moving the closing "p" tag. With these changes, the troublesome document now produces this: <div> <p> </p><pre>hi</pre> </div> An improvement on what came before. Some might argue that the empty "p" section ought to simply be omitted entirely. Perhaps. But there was an explicit open "p" tag in the text -- auto closing it is one thing -- removing an explicit open tag entirely is something else. Additionally, since the validator validates in a "streamy" way, that's much more difficult to accomplish since at the time the initial opening "p" has been seen there's not yet any information available about the fact it's about to be auto-closed while still not containing any text and it therefore gets emitted to the output. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	e004a5275c	Markdown.pl: do not leave remnant state lying around When commit `c86fea4089` ("Markdown: enhance link handling", 2019-10-20, markdown_1.1.8) did its thing, a new global (%g_anchors_id) was introduced to keep track of all the link ids being used/generated in order to better connect them up to the links meant to target them. Unfortunately, that hash was not getting cleared before processing each new document. While this is mostly not a problem when running from the command line since typically only one document ever gets processed at once, if more than one document is processed at a time, prior documents could affect the link fragment targets for subsequent documents. Correct the problem by properly resetting the global (along with all the others that are also reset) before processing a new document. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	32862223ad	Markdown.pl: make the default yaml API mode match CLI The default YAML mode from the command line shows unknown YAML options in a table prefixed to the output and applies the ones it recognizes. Make the API have the same default mode rather than silently discarding unknown YAML options and ignoring known ones. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	1ecc6a0fe5	Markdown.pl: isolate the archaic tab default If running as a plug in for either of the two original systems that this was designed to "plug in" to, continue to use the archaic, non-standard default for expansion width of physical tabs. This setting does not affect the "indent level" width. Otherwise, force the physical tab width expansion to default to the expected and standard value. This has been the behavior for some time already, except that when "use"ing Markdown.pm and calling the API directly this was being bypassed in favor of the old, archaic default. With this change, the old, archaic default becomes isolated to those two originally supported systems. The setting can still, of course, be changed by using an option to whatever is desired. The default though will now be more sane for more clients. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	751b55b7c6	Markdown.pl: use some sanity Replace 'require' with 'use' in a few places where it should have been "used" in the first place. Make sure the essential package variables are initialized inside a BEGIN block. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	ff7fb525fc	Markdown.pl: avoid accidentally nested anchors Given something that looks like this: [1][] [1]: https://example.com/ Ever since commit `dfbf2b4e30` ("Markdown.pl: retain square brackets around footnotes", 2017-01-19, markdown_1.1.2), the link text has been rendered to include the surrounding '[' and ']' because it just looks better that way and produces a bigger link target. Unfortunately that can result in the linked text being processed again and producing a nexted anchor which is not only invalid according to the XHTML specification but is also the wrong rendering for the input. Deal with this by hiding the '[' and ']' characters inside link text the same way other characters within the link text are already being hidden. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	6324b499f7	Markdown.pl: provide anchor API access The actual anchor id values produced while processing a page are not necessarily immediately obvious. These implicit anchor id values are created for all markdown- format H1-H6 headers by "processing" the text of the header. Provide a new external function, ResolveFragment that can hook up a fragment identifier to one of these automatically- generated anchor id values by transforming it as needed. The lookup table needed by ResolveFragment can be retrieved after calling Markdown by first setting the 'anchors' key in the passed in options HASH ref. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	191f62119c	Markdown.pl: provide urlfunc hook and helpers Provide a new urlfunc hook that can inspect/change all urls that are in "a" "href" attributes and "img" "src" attributes. Make the new SplitURL and unescapeXML routines exportable (@EXPORT_OK) and rename the old escape function to be escapeXML and make it exportable (@EXPORT_OK) too. Add some nice comments to each of the newly exportable functions. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	4ba2a0423a	Markdown.pl: document and catch more meaningless tags There are a few tags (e.g. `a`, `area`, `img`, `map`) that require at least one attribute to be present in order to be meaningful. When these tags occur without any attributes they are treated as non-tags and the leading `<` is escaped to `<`. This can only happen when sanitize mode is active. Although already partially implemented, it was not documented in the help. Add discussion of this to the help and make the implementation more robust to catch more of these tags. This is not intended to be a perversely pedantic change, but rather to allow such meaningless tags to be used as plain text without the need for escaping. For example the text: The <a><c><e> process ... Can be used exactly as-is and all of the `<`s will automatically be escaped to `<` since none of them specify meaningful tags. Of course, using the `--no-sanitize` option will disable this behavior. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	34b44054db	Markdown.pl: sanitize common "oops" entities Take a hint from w3m and quietly fix up the six common entities < > & " '   when they are missing their trailing ';' provided whatever trailing character is there is not alphanumeric, an equals sign or a semicolon. Without this change this case the leading ampersand would have ended up being escaped to & in these cases which seems likely to be almost certainly incorrect. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	bf4a09aeb2	Markdown.pl: sanitize all "&" issues When sanitize is active (--sanitize, the default), make sure all "&" issues are checked. This includes things like bare "&" that should be "&" but aren't. And it includes single/double quote characters inside attribute values that should be encoded and are not. Since the internal validator requires the sanitize mode to be active, this now makes sure that the internal validation mode cannot pass through any invalid entity references to the output. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	964670e66b	Markdown.pl: run _EncodeAmpsAndAngles on top-level raw html blocks At the top level of the document, the _HashHTMLBlocks function gets called to sequester raw top-level html blocks from being processed. As a result, anything in these top-level blocks escapes general Markdown processing except that if XML validation has been enabled (the default), the final result of processing does always pass through a validation stage. On the one hand that's good as it allows raw HTML in Markdown docs, but on the other hand, some basic fix ups are not happening and that's bad. Rather than try and push all of the top-level raw HTML block content through either _RunBlockGamut or _RunSpanGamut (thereby somewhat defeating the point of allowing raw HTML top-level blocks in the first place), use a compromise between the two extremes and push all the text of raw HTML block content through just the _EncodeAmpsAndAngles function. This causes things like non-html-escaped ampersands (&) inside "href" and "src" attributes to magically be transformed into "&" and at the same time any url adjustment options (i.e. -r, -i, -b, -a) to be applied. The result produces better and less surprising outcomes than before. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	cec2468782	various: update copyright year Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	83a2b69572	Markdown.pl: add missing ul to tagblk list The <ul> tag is just as much a block as the <ol> and <dl> tags. Correct the omission by adding it to the tagblk hash. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	26e8ebf4c1	Markdown.pl: treat center as a block because it is Although the <center>...</center> tag has been deprecated, it still occurs in the wild. Since it's equivalent to <div align="center">...</div> it needs to be treated as a block level tag. Add it to tagblk to make it so. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago
Kyle J. McKay	ff92cf5457	Markdown.pl: dd, dt and li do not autoclose containing table While <dd>, <dt> and <li> all have "optional" closing tags, they can all be contained within a table. And as such must not close the tags that define the content of the table itself. Customize the tagacl list for these three to exclude the tags that may contain table content to prevent their premature closing. Signed-off-by: Kyle J. McKay <mackyle@gmail.com>	4 years ago

1 2 3 4 5

218 Commits (c154f45386b371157e435b234bfc54140a2f1587) All Branches Search

218 Commits (c154f45386b371157e435b234bfc54140a2f1587)

All Branches