PDA

View Full Version : Javascript dealing with multibyte characters



yapning
03-03-2006, 09:08 AM
My Colleague told me that last time he met quite alot of unsolvable problems when handling a certain multibyte/multilingual characters using Javascript. Is it true? Meaning a certain javascript codes such as Data Validation and any codes dealing with Text/String value may not fully work when interact with a certain problematic multilingual characters?

I have not developed much of multilingual webpages before, so I really need opinion/suggestion to decide whether to use more javascript/ajax that can interact with Text/String without problems to build more interactive webpages. Thanks.

jscheuer1
03-03-2006, 03:40 PM
It's true.

Twey
03-03-2006, 05:02 PM
It can usually be worked around. Just requires a little more code.

yapning
03-07-2006, 10:21 AM
Hi, Thanks for replying. I am wondering whether you guys have any example that would show multilingual characters problem on javascript? Because, until now I still cannot find any on the web and also on my test.

I know there will be problem if dealing with character by character, which the string length maybe different from different browser or the character array might return different result. However, most of the operation I use javascript when dealing with Text/String should be the assignment(=), will it cause alot of problem too? for instance, the Text/String maybe corrupted after the javascript assignment?

My javascript usage on possible Multilingual Text/String can be as follow :
1. I may just use javascript to get the value from the textfield keyed-in by the user and interactively display on the webpage through the use of innerHTML or submit the form, or
2. another example is, what if I just dynamically add another selectbox with Korean characters or any other multibyte characters onto the webpage when the user click on the Add button, or
3. one last example is, what if I display all the records on the webpage after retriving the xml data from the ajax result which the data can be in any kind of encoding and language.

Thanks.

mwinter
03-07-2006, 02:40 PM
My Colleague told me that last time he met quite alot of unsolvable problems when handling a certain multibyte/multilingual characters using Javascript. Is it true?That's rather vague, don't you think? What specific 'unsolvable problems' did he encounter?

There shouldn't be problems with derivatives of ECMAScript (like JavaScript), as such, as all strings are encoded using UTF-16[1]. Issues regarding internationalisation are usually the result of browser behaviour, in that browsers treat user input differently according to the encoding of the HTML document. See Alan J. Flavell's notes on character sets and internationalisation (http://ppewww.ph.gla.ac.uk/~flavell/charset/). The article titled, "FORM submission and i18n (http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html)" should be of particular interest as you'll be taking input from form controls.

That said, there certainly are additional concerns when it comes to validating input. Whilst one might be able to limit numeric input to the 7-bit ASCII variety (U+0030 to U+0039), other input, such as a visitor's name, may warrant a much wider repertoire.


I know there will be problem if dealing with character by character, which the string length maybe different from different browser or the character array might return different result.As I wrote above, this shouldn't really be an issue. Any character that can be represented by a single 16-bit code unit will be a single character in a string.


However, most of the operation I use javascript when dealing with Text/String should be the assignment(=), will it cause alot of problem too?No.

Mike


[1] This does mean that surrogate pairs (32-bit scalar values) could be a pain as they will use two code units, rather than the one required for all other characters. However, surrogates should be rare.