Go to content Go to navigation Go to search

Extended Characters in Your JavaScript · Dec 14, 04:42 PM by Dylan Doxey

So, there you are happily coding away on your web application for your Japanese audience. You think you've buttoned it all up, and you'll go ahead and give it a courtesy end user test before you launch it. just to be sure.

And there it is, the dreaded corrupt CJK characters in your JavaScript.

corrupt Japanese text in JavaScript

Surely you were going for something more like this.

clear Japanese text in JavaScript

Do not fret! There is a reliable solution.

Generally you might be inclined to do this:

    var characters = prompt( 'こんにちは、世界的', '' );

And why the heck not?

This solution is fine, provided there is no confusion about character encoding anywere between your text editor and the web client's browser software.
This confusion could arise in a number of places. To mention a few:

If at any point in this chain of custody something makes an assumption about the encoding, then your wide characters may become corrupt.

The solution is to use the JavaScript Unicode escaped version of the characters that go beyond the ASCII range.

    var characters = prompt( '\u3053\u3093\u306b\u3061\u306f\u3001\u4e16\u754c\u7684', '' );

Sweet! Problem solved. Now we can all go back to our stations and continue having been edified with this new insight!

Well, not quite.

Who really knows the Unicode values of the CJK text they're working with? Surely, no one.

Yes, that's right, it's another opportunity to write some code.

 1 #!/usr/bin/perl 
 2  
 3 use strict;
 4 use Encode qw( decode_utf8 );
 5  
 6 if ( !@ARGV ) {
 7     print "\nUsage:\n  $0 some string to encode\n\n";
 8     exit; 
 9 }   
10  
11 my $js_encoded = "";
12  
13 my $string = decode_utf8( join ' ', @ARGV );
14  
15 for my $char (split //, $string) {
16  
17     my $unicode = sprintf '%0.4x', ord $char;
18     
19     $js_encoded .= '\u' . $unicode;
20 }   
21  
22 print "\nJS Encoded:\n";
23 print "    $js_encoded\n";

And there you have it. Now all you need to do is run each snippet of CJK text you want to include in your JavaScript through this program.

dylan@dev.doxey.org$: ~ ./js_encode.pl こんにちは、世界的

JS Encoded:
    \u3053\u3093\u306b\u3061\u306f\u3001\u4e16\u754c\u7684

Happy computing!

Commenting is closed for this article.