I'm not sure why this is necessary. My tests in t/test.t work just
fine on my Mac with mac.pl installed. Without it, reindexing results
in a silent crash of Perl on the machine where emacswiki.org is
installed.
Issues:
- the Mac layer was masking issues because of the NFC/NFD difference and existing compatiblity hacks in mac.pl
- drafts.pl was suffering from a double encoding issue
- crossbar.t and download.t tests were failing because I had recently fixed DoDownload output to be raw instead of encoded
- test.pl now has a way to capture the raw, unencoded output produced by DoDownload
- tags.t got some tests to prove that recent changes to wiki.pl actually work
Without this, search will break on pagenames with non-ASCII characters.
I noted this in conjunction with tags.pl. There, I needed to encode
the page names for search to work correctly.
There is no way to provide an encoding layer to directory names.
Therefore they need to be raw bytes and not characters. This becomes
apparent when creating namespaces containing non-ASCII characters.
Recently, uploaded files don't just contain #FILE and a MIME type --
the MIME type is followed by a space and optionally more information.
I replaced the hand-coded parsing with a call to TextIsFile and added
better error checking and fixed the error messages (they used $s
instead of %s).
Drafts are saved using the username as filename. This must also be
encoded and decoded correctly. Because of NFC and NFD issues on Mac
HFS, an appropriate normalization was added to mac.pl.
As the username is also part of the cookie, this showed that the
Cookie content wasn't being encoded correctly, so that was fixed, too.
The Debian installation uses ext3 and therefore raw bytes for
filenames unlike the HFS filesystem of Mac OSX.
Copyright years were updated. The maintenance output of for drafts was
cleaned up.
All the source files containing non-ASCII characters needed to have
utf8 added. This will be necessary for user config files as well! The
regular expressions identifying page names had to be changed.
UrlEncode translates the string back to bytes before encoding it.
Cached RSS files are saved with UTF-8 encoding and therefore need
their meta-data changed (using the XML::RSS module to do this
correctly didn't work for some of the test files). The CGI object's
parameters, keywords and info_path are decoded correctly. File access
uses the UTF-8 layer (reading, writing, appending, access to the log
of recent changes, running sub processes with grep and diff).
The mac compatibility extension will also disable the use of grep if
non-ASCII characters are searched for because of an unexplained
problem with grep.