Clean up the formatting in baiscs and syntax to make it
more readable as a text document.
This is now possible by making use of the automatic
anchors for top-level headers and '~~~~~' style h3's.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Long documents often need to link within themselves in order to
provide a convenient table of contents section.
To facilitate this, all setext-style and atx-style headers defined
at the top-level (i.e. they start at the left margin) now have
automatic anchors added to them and link definitions added for
them provided there is not already a link definition with the
same id present.
These can be easily targeted using the "implicit link name"
shortcut (e.g. [Foo][]).
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
This example:
* a
+ 1. x
+ 2. y
* c
Should format as one outer "ul" with the first item having
a second inner "ul". There should not be any "ol" lists in
the formatted result at all.
Correct the code so that it does not think "+ 1. x" not only
starts a list but also includes a sublist.
Now it only starts a list where the first item just happens to
have content that closely resembles a list marker.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
The regular, indented-by-four-spaces, code blocks do not nest nor
should they. But they were nesting if they were located inside a
list. Fix this by hashifying them and not unhashifying them until
the very end.
Also there's a kludge in the code that says:
# Turn double returns into triple returns, so that we can make a
# paragraph for the last item in a list, if necessary
Unfortunately that perverts blank lines inside a code block.
Fix this by changing the perversion so that it accomplishes the
same thing but has an exact inverse and apply that inverse before
formatting code blocks.
Code blocks inside lists should now format correctly (and this
does fix the example in the README that was previously formatted
incorrectly).
Finally, a code block at the very beginning of the file preceded
by a single blank line would not have been recognized (but if it
were preceded by none or two or more it would have). Now it will
be recognized properly.
And one more thing. Since we're in there tweaking code blocks,
wrap the output in a <div>...</div> section and insert a null
<dl></dl> right after the opening <div> tag. This makes sure
the displayed code block will not end up getting mashed up
against something it shouldn't be mashed up against.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
The change in version 1.0.1 to fix "a bug where lines in the
middle of hard-wrapped paragraphs, which lines look like the
start of a list item, would accidentally trigger the creation
of a list," broke things like this:
For example:
* broken
* microphone
That will now be recognized as a list again. The heuristic is
now when two lines in a row start with the same type of list
marker then recognize that as a list even when the first item
doesn't appear to start its own paragraph.
Additionally if the second line is at the next indent level
then the two lines may have different kinds of list markers
and still be recognized.
All previously recognized lists are still recognized.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
The --tabwidth=<num> option only affects the width to which
tabs are expanded. It does NOT affect the number of spaces
required to start a new indent level. That remains set at 4
no matter what value is used for the --tabwidth=<num> option.
With this change it's now, finally, possible to have proper
tab expansion without breaking the "4 spaces per indent level"
rule.
Note that backticks-delimited code blocks will always expand
their tabs to 8-character tab stop positions no matter what
value is used for the --tabwidth=<num> option.
With this change the default expansion width for tabs when
Markdown.pl is run from the command line is now 8.
When used as a module the default is still 4, but that's
easily changed by passing in a suitable option.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Move one-time initialization into BEGIN blocks.
Avoid running qr(...) more than once on expressions that
do not change (actually Perl should mostly already do this).
Get rid of the kludgy check for command-line and move all
that code into a new _main function and call it only when
being run from the comamnd line.
This seems to have resulted in a very very very tiny speed
boost as well.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
There's not a lot to work with in the way of speeding
things up. However, after timing a few different changes
there were some minor speed ups to be had.
In particular, md5_hex is no longer used in favor
of a global hash table instead.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Obscuring email addresses is all very nice, but
outputting a different document every time for the
same input is not.
It screws up caching and last modified checks and
is a very bad thing to do.
Instead continue to obscure email addresses, but
arrange for the same obscurity to be used when the
same input file is processed repeatedly by Markdown.pl
on the same machine.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Do not require the surrounding '<'...'>' to automatically
turn an https? or ftps? address into a real link.
Additionally, recognize [RFC...] and turn those into the
appropriate links.
This is more in line with how other processors handle
Markdown to html conversions.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Continue to mention that Markdown expands all tabs to spaces
before doing anything else and that it guarantees to expaned
tabs withing ```-delimited code blocks to 8-character tab-stop
positions.
Otherwise always talk about "indent" levels in terms of 4
spaces per level.
There is far too much Markdown content in existence already
that assumes indenting by 4 spaces gives a new indent level.
That does not necessarily imply that all the creators of these
documents have incorrectly attempted to alter the hard-coded
terminal physical tab-stop settings from the fixed value of 8
to something else that the docs seemed to imply was 4.
Users with some other tab-stop setting were left out in the
cold by the docs. In particular, a tab-stop setting of 3 would
have rendered the examples completely useless and the text just
plain wrong.
With this change the docs no longer assume anything in particular
about the user's tab-stop settings.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
The Markdown function can be called repeatedly, so
stop ignoring file arguments after the first one and
process them all just like the docs claim.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Use pod2usage to display help instead of running
perldoc directlry.
Show only the brief synopsis with -h but the full
help with --help.
Add brief option synopsis to usage section.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Cleanup whitespace throughout the code and Markdown sources.
Fix the formatting of the POD documentation so it looks nice
when formatted as either text or html.
Tidy up the license and copyright information.
Retain '<' and '>' around "auto" links.
Avoid using tabs when producing nested <blockquote>...</blockquote>
content.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Get rid of ".text" extension.
Use standard "README" and "LICENSE" names for those.
The help files are markdown so make them have a .md extension.
Markdown.pl is kept as is (rather than changing it to markdown.pl)
because it's also a Perl module (Markdown) and Perl module names
typically start with an uppercase letter. Were it to be renamed
to a ".pm" it would end up being Markdown.pm NOT markdown.pm.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Some systems do not have a /usr/bin/perl but it will be in
the $PATH there so /usr/bin/env will be able to find it.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Handle ```-delimited code blocks earlier so that tabs within them
can be correctly expanded to 8-character tab stop positions and
also to avoid the result being incorrectly interpreted any further.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Options may now be passed in to the Markdown function and it
may be called multiple times with different options on different
texts with no unwanted interaction between the calls.
Simply `require "Markdown.pl"` and then call Markdown::Markdown.
Or something like this will import the `Markdown` function
regardless of whether it's available in Markdown.pl or Markdown.pm:
BEGIN {eval {require "Markdown.pl"}}
use Markdown qw(Markdown);
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
If --imageroot <prefix> is used, then <prefix> will be
prefixed to all generated URLs that are not absolute URLs
(i.e. any that do not start with a scheme or '//' and are
not email addresses) and end in an image extension. This
will override any --htmlroot <prefix> setting in that case.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
If --htmlroot <prefix> is used, then <prefix> will be
prefixed to all generated URLs that are not absolute URLs
(i.e. any that do not start with a scheme or '//' and are
not email addresses).
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Update version number to 1.0.2 and tweak documentation
to eliminate outside links to documentation that is no
longer complete since the enhancements have been added.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Non-indented code blocks may be used by preceding them
with a line consisting of 3 (or more) ` characters and
following them with a line consisting of the same
number of backtick characters.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Markdown.pl (incorrectly) expands hard tabs to spaces
using a tab stops set 4 spaces apart.
While it would be nice to fix this for code blocks, it
expands the tabs to spaces before doing anything else
so it would be non-trivial to do so.
Instead explain this deviant behavior in the help
files.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
A tab indents 8 spaces. Period.
The coding style for this file is an indent of 4.
Correct the assume-tabs-are-broken-and-only-4-spaces problem.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
The '*' character can still be used for emphasis within
a word, but '_' will remain unchanged unless it starts
and ends a word.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Using a '~' underline will now generate an h3 header.
A preceding overline made with the same character as the underline
is now permitted.
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>