I think I tried that, setting the encoding to UTF-8 in the header for the PHP page, UTF-8 on the receiving HTML page. The problem can be simplified as:
PHP Code:
<?php
$myvar1 = "Plain";
$myvar2 = "Užimtas";
echo $myvar2 . '<br>'; //gives: Užimtas<br>
$answer = array ($myvar1, $myvar2);
echo $answer[1] . '<br>'; //gives: Užimtas<br>
echo json_encode($answer); //gives: ["Plain",null]
?>
Looks like so on the page:
Užimtas
Užimtas
["Plain",null]
Whereas:
Code:
<?php
$myvar1 = "Plain";
$myvar2 = "Užimtas";
echo $myvar2 . '<br>'; //gives: Užimtas<br>
$answer = array ($myvar1, $myvar2);
echo $answer[1] . '<br>'; //gives: Užimtas<br>
echo json_encode($answer); //gives: ["Plain",Užimtas]
?>
Looks like so on the page:
Užimtas
Užimtas
["Plain","Užimtas"]
So it's pretty clear that if we could have converted Užimtas (in the first example) to Užimtas before (or during, but I don't think that's possible) json_encode, things would work out well. Or if we could get json_encode to not choke on Užimtas . . . That would be another approach, but less applicable in general.
By extension, if we could scan all variables/array values prior to json_encode and convert any non-ASCII characters in them to valid UNICODE entities, that would make the process universally applicable.
Now, I tried james438's link, it gives the hex entity, no good for a valid HTML page. But Googling "Convert Text to Unicode" (which is the main heading of the page james438 linked to) got me:
http://www.pinyin.info/tools/convert...ninumbers.html
Which employs a simple javascript that does almost exactly what I would want to do in PHP. All it needs is a little tweak to get it to output valid UNICODE entities (add preceding 0(s) for values of a length less than 4). Could this be easily translated to PHP? Here's my modified version of the javascript:
Code:
/* convertToEntities()
* Convert non-ASCII characters to valid HTML UNICODE entities */
function convertToEntities(astr){
var bstr = '', cstr, i = 0;
for(i; i < astr.length; ++i){
if(astr.charCodeAt(i) > 127){
cstr = astr.charCodeAt(i).toString(10);
while(cstr.length < 4){
cstr = '0' + cstr;
}
bstr += '&#' + cstr + ';';
} else {
bstr += astr.charAt(i);
}
}
return bstr;
}
Bookmarks