Saturday, August 29, 2009

Perl Unicode

Recently I was struggling with the unicode in Perl, there are a few things I found it is tricky:

From here, there is a clear explanation on the encode/decode functions. If my perl program is a command line program which accepts cp950 arguments in a chinese windows, the arguments need to be "decoded" into Perl's internal form.

for(my $i=0; $i < scalar(@ARGV); $i++) {
$ARGV[$i] = Encode::Byte::decode("cp950", $ARGV[$i]);
}

When you send the arguments to a server that accepts UTF8, you need to send

$data =~ s/(\P{IsASCII})/sprintf('%2x;',ord($1))/eg;

After getting the server response, if it is in UTF8, you need to decode the response one more into Perl's internal form.

$response = Encode::decode_utf8($response);

So, the last step is, we need to change the Perl's internal form to cp950 for the STDOUT.

binmode STDOUT, ":cp950";

This is all from my memory if there is any mistake.

No comments: