Browse Source

Markdown.pl: support UTF-8

Auto-detect input format of either ISO-8859-1 (interpreted as
per the HTML 5 specification) or UTF-8 and always write UTF-8
to the output.

As a result of this change at least Perl 5.8.0 is now required.

The stub document now includes a charset (both meta tags).

Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
master
Kyle J. McKay 8 years ago
parent
commit
1b421edb2d
  1. 44
      Markdown.pl
  2. 14
      README

44
Markdown.pl

@ -12,10 +12,12 @@
package Markdown;
require 5.006_000;
require 5.008;
use strict;
use warnings;
use Encode;
use vars qw($COPYRIGHT $VERSION @ISA @EXPORT_OK);
BEGIN {*COPYRIGHT =
@ -23,7 +25,7 @@ BEGIN {*COPYRIGHT =
Copyright (C) 2015,2016,2017 Kyle J. McKay
All rights reserved.
";
*VERSION = \"1.1.0" # Wed 11 Jan 2017
*VERSION = \"1.1.1" # Wed 11 Jan 2017
}
require Exporter;
@ -38,10 +40,12 @@ $INC{__PACKAGE__.'.pm'} = $INC{basename(__FILE__)} unless exists $INC{__PACKAGE_
close(DATA) if fileno(DATA);
exit(&_main(@ARGV)||0) unless caller;
## Disabled; causes problems under Perl 5.6.1:
# use utf8;
# binmode( STDOUT, ":utf8" ); # c.f.: http://acis.openlib.org/dev/perl-unicode-struggle.html
my $encoder;
BEGIN {
$encoder = Encode::find_encoding('Windows-1252') ||
Encode::find_encoding('ISO-8859-1') or
die "failed to load ISO-8859-1 encoder\n";
}
#
# Global default settings:
@ -316,11 +320,15 @@ sub _main {
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
HTML5
} elsif ($stub < 0) {
print <<'HTML4';
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="content-type" content="text/html; charset=utf-8">
HTML4
}
if ($stub && ($options{title} || $options{h1})) {
@ -375,8 +383,16 @@ sub Markdown {
# _EscapeSpecialChars(), so that any *'s or _'s in the <a>
# and <img> tags get encoded.
#
my $text = shift;
defined $text or $text='';
my $_text = shift;
defined $_text or $_text='';
my $text;
if (Encode::is_utf8($_text) || utf8::decode($_text)) {
$text = $_text;
} else {
$text = $encoder->decode($_text, Encode::FB_DEFAULT);
}
$_text = undef;
# Any remaining arguments after the first are options; either a single
# hashref or a list of name, value paurs.
@ -445,8 +461,11 @@ sub Markdown {
$text .= "\n" unless $text eq "";
${$_[0]}{h1} = $opt{h1}
if defined($opt{h1}) && $opt{h1} ne "" && ref($_[0]) eq "HASH";
utf8::encode($text);
if (defined($opt{h1}) && $opt{h1} ne "" && ref($_[0]) eq "HASH") {
utf8::encode($opt{h1});
${$_[0]}{h1} = $opt{h1}
}
return $text;
}
@ -2087,6 +2106,9 @@ HTML tags (like <div> and <table> as well).
For more information about Markdown's syntax, see the F<basics.md>
and F<syntax.md> files included with F<Markdown.pl>.
Input (auto-detected) may be either ISO-8859-1 or UTF-8. Output is always
converted to the UTF-8 character set.
=head1 OPTIONS
@ -2182,6 +2204,8 @@ Z<> See the F<README> file for detailed release notes for this version.
=over
=item Z<> 1.1.1 - 11 Jan 2017
=item Z<> 1.1.0 - 11 Jan 2017
=item Z<> 1.0.4 - 05 Jun 2016

14
README

@ -2,7 +2,7 @@
Markdown
========
Version 1.1.0 - Wed 11 Jan 2017
Version 1.1.1 - Wed 11 Jan 2017
John Gruber
Kyle J. McKay
@ -34,9 +34,13 @@ in Markdown.)
Installation and Requirements
-----------------------------
Markdown requires Perl 5.6.0 or later. Welcome to the 21st Century.
Markdown requires Perl 5.8.0 or later. Welcome to the 21st Century.
Markdown also requires the standard Perl library module `Digest::MD5`.
As of version 1.1.1, Markdown auto-detects the character set of the
input (US-ASCII, ISO-8859-1 and UTF-8 are supported) and always
converts the input to UTF-8 when writing the output.
Movable Type
~~~~~~~~~~~~
@ -169,6 +173,12 @@ Markdown.pl source code for more information.
Version History
---------------
1.1.1 (11 Jan 2017):
+ Markdown.pl: auto-detect latin-1/utf-8 input always output utf-8
The minimum version of Perl required is now 5.8.0.
1.1.0 (11 Jan 2017):
+ Markdown.pl: handle some limited [[wiki style links]]

Loading…
Cancel
Save