Browse Source

Markdown.pl: support UTF-8

Auto-detect input format of either ISO-8859-1 (interpreted as
per the HTML 5 specification) or UTF-8 and always write UTF-8
to the output.

As a result of this change at least Perl 5.8.0 is now required.

The stub document now includes a charset (both meta tags).

Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
master
Kyle J. McKay 8 years ago
parent
commit
1b421edb2d
  1. 44
      Markdown.pl
  2. 14
      README

44
Markdown.pl

@ -12,10 +12,12 @@
package Markdown; package Markdown;
require 5.006_000; require 5.008;
use strict; use strict;
use warnings; use warnings;
use Encode;
use vars qw($COPYRIGHT $VERSION @ISA @EXPORT_OK); use vars qw($COPYRIGHT $VERSION @ISA @EXPORT_OK);
BEGIN {*COPYRIGHT = BEGIN {*COPYRIGHT =
@ -23,7 +25,7 @@ BEGIN {*COPYRIGHT =
Copyright (C) 2015,2016,2017 Kyle J. McKay Copyright (C) 2015,2016,2017 Kyle J. McKay
All rights reserved. All rights reserved.
"; ";
*VERSION = \"1.1.0" # Wed 11 Jan 2017 *VERSION = \"1.1.1" # Wed 11 Jan 2017
} }
require Exporter; require Exporter;
@ -38,10 +40,12 @@ $INC{__PACKAGE__.'.pm'} = $INC{basename(__FILE__)} unless exists $INC{__PACKAGE_
close(DATA) if fileno(DATA); close(DATA) if fileno(DATA);
exit(&_main(@ARGV)||0) unless caller; exit(&_main(@ARGV)||0) unless caller;
## Disabled; causes problems under Perl 5.6.1: my $encoder;
# use utf8; BEGIN {
# binmode( STDOUT, ":utf8" ); # c.f.: http://acis.openlib.org/dev/perl-unicode-struggle.html $encoder = Encode::find_encoding('Windows-1252') ||
Encode::find_encoding('ISO-8859-1') or
die "failed to load ISO-8859-1 encoder\n";
}
# #
# Global default settings: # Global default settings:
@ -316,11 +320,15 @@ sub _main {
<!DOCTYPE html> <!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"> <html xmlns="http://www.w3.org/1999/xhtml">
<head> <head>
<meta charset="utf-8" />
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
HTML5 HTML5
} elsif ($stub < 0) { } elsif ($stub < 0) {
print <<'HTML4'; print <<'HTML4';
<html> <html>
<head> <head>
<meta charset="utf-8">
<meta http-equiv="content-type" content="text/html; charset=utf-8">
HTML4 HTML4
} }
if ($stub && ($options{title} || $options{h1})) { if ($stub && ($options{title} || $options{h1})) {
@ -375,8 +383,16 @@ sub Markdown {
# _EscapeSpecialChars(), so that any *'s or _'s in the <a> # _EscapeSpecialChars(), so that any *'s or _'s in the <a>
# and <img> tags get encoded. # and <img> tags get encoded.
# #
my $text = shift; my $_text = shift;
defined $text or $text=''; defined $_text or $_text='';
my $text;
if (Encode::is_utf8($_text) || utf8::decode($_text)) {
$text = $_text;
} else {
$text = $encoder->decode($_text, Encode::FB_DEFAULT);
}
$_text = undef;
# Any remaining arguments after the first are options; either a single # Any remaining arguments after the first are options; either a single
# hashref or a list of name, value paurs. # hashref or a list of name, value paurs.
@ -445,8 +461,11 @@ sub Markdown {
$text .= "\n" unless $text eq ""; $text .= "\n" unless $text eq "";
${$_[0]}{h1} = $opt{h1} utf8::encode($text);
if defined($opt{h1}) && $opt{h1} ne "" && ref($_[0]) eq "HASH"; if (defined($opt{h1}) && $opt{h1} ne "" && ref($_[0]) eq "HASH") {
utf8::encode($opt{h1});
${$_[0]}{h1} = $opt{h1}
}
return $text; return $text;
} }
@ -2087,6 +2106,9 @@ HTML tags (like <div> and <table> as well).
For more information about Markdown's syntax, see the F<basics.md> For more information about Markdown's syntax, see the F<basics.md>
and F<syntax.md> files included with F<Markdown.pl>. and F<syntax.md> files included with F<Markdown.pl>.
Input (auto-detected) may be either ISO-8859-1 or UTF-8. Output is always
converted to the UTF-8 character set.
=head1 OPTIONS =head1 OPTIONS
@ -2182,6 +2204,8 @@ Z<> See the F<README> file for detailed release notes for this version.
=over =over
=item Z<> 1.1.1 - 11 Jan 2017
=item Z<> 1.1.0 - 11 Jan 2017 =item Z<> 1.1.0 - 11 Jan 2017
=item Z<> 1.0.4 - 05 Jun 2016 =item Z<> 1.0.4 - 05 Jun 2016

14
README

@ -2,7 +2,7 @@
Markdown Markdown
======== ========
Version 1.1.0 - Wed 11 Jan 2017 Version 1.1.1 - Wed 11 Jan 2017
John Gruber John Gruber
Kyle J. McKay Kyle J. McKay
@ -34,9 +34,13 @@ in Markdown.)
Installation and Requirements Installation and Requirements
----------------------------- -----------------------------
Markdown requires Perl 5.6.0 or later. Welcome to the 21st Century. Markdown requires Perl 5.8.0 or later. Welcome to the 21st Century.
Markdown also requires the standard Perl library module `Digest::MD5`. Markdown also requires the standard Perl library module `Digest::MD5`.
As of version 1.1.1, Markdown auto-detects the character set of the
input (US-ASCII, ISO-8859-1 and UTF-8 are supported) and always
converts the input to UTF-8 when writing the output.
Movable Type Movable Type
~~~~~~~~~~~~ ~~~~~~~~~~~~
@ -169,6 +173,12 @@ Markdown.pl source code for more information.
Version History Version History
--------------- ---------------
1.1.1 (11 Jan 2017):
+ Markdown.pl: auto-detect latin-1/utf-8 input always output utf-8
The minimum version of Perl required is now 5.8.0.
1.1.0 (11 Jan 2017): 1.1.0 (11 Jan 2017):
+ Markdown.pl: handle some limited [[wiki style links]] + Markdown.pl: handle some limited [[wiki style links]]

Loading…
Cancel
Save