As we derive a lot of filenames from strings in UTF-8 encoded files, we
need to make sure that any filename that might might be set by a user –
including all the filenames containing a directory deriving from
$DataDir – are passed through utf8::encode. That is, every character
gets replaced with a sequence of one or more characters that represent
the individual bytes of the character and the UTF8 flag is turned off.
In other words, -d $DataDir might not work if $DataDir contains a UTF-8
encoded string. The solution is to use the following replacements:
-f $name IsFile($name)
-e $name IsFile($name)
-d $name IsDir($name)
(stat($name))[9] Modified($name)
-M $name $Now - Modified($name)
-z $name ZeroSize($name)
unlink $name Unlink($name)
mkdir $name CreateDir($name)
rmdir $name RemoveDir($name)
(Using IsFile for -e is probably not ideal?)
If you don’t, and Oddmuse gets used with Mojolicious, and you use the
Namespaces Extension, and a namespace contains non-ASCII characters such
as ä, ö, or ü, these characters will end up as part of $DataDir and
trigger the problem.
I also wonder whether we should be using some other Perl library.
sub ParseData is fully backwards compatible. If some module runs it in list
context, then it will get listified hash like previously. New code should
always run it in scalar context though (everything in our code base
was changed according to that).
sub GetTextRevision is not backwards compatible (don't let “wantarray” usage
to confuse you). Most modules do not touch that subroutine, so we are probably
fine (modules from our git repo that do use were changed accordingly).
“EncodePage(%$page)” looks wrong. It seems like we should change it to accept
hash ref.
$_ is not a copy, it is an alias to the original value.
Therefore modifying it will mess with original list... That's
not what we want most of the time.
Also, using map to s/// two variables does not look right. What
a stupid race to save one line of code.
Issues:
- the Mac layer was masking issues because of the NFC/NFD difference and existing compatiblity hacks in mac.pl
- drafts.pl was suffering from a double encoding issue
- crossbar.t and download.t tests were failing because I had recently fixed DoDownload output to be raw instead of encoded
- test.pl now has a way to capture the raw, unencoded output produced by DoDownload
- tags.t got some tests to prove that recent changes to wiki.pl actually work
Drafts are saved using the username as filename. This must also be
encoded and decoded correctly. Because of NFC and NFD issues on Mac
HFS, an appropriate normalization was added to mac.pl.
As the username is also part of the cookie, this showed that the
Cookie content wasn't being encoded correctly, so that was fixed, too.
The Debian installation uses ext3 and therefore raw bytes for
filenames unlike the HFS filesystem of Mac OSX.
Copyright years were updated. The maintenance output of for drafts was
cleaned up.