View Full Version : Standard URL/HTML entities for foreign characters
09-27-2009, 12:18 AM
I am creating a quiz for language learning. I'm using html entities for Japanese characters.
I found some entities that worked fine in the HTML, but once submitted through the URL (via a form as an answer), they became another html entity that was equivalent to the same character, but not "equal" as far as PHP was concerned.
Is there a way to figure out what is the "official" html code for certain characters? Does it vary by browser at all?
09-27-2009, 01:12 AM
I'm not 100% on this, but I would think it would have to do with the type of encoding used both on the HTML page as well as the PHP script (not sure how you would do this though).
Hope this helps nonetheless.
09-27-2009, 02:30 AM
That's probably true, but I'm not sure about sending it through the URL as a get variable.
I got it working at the moment just by trial and error (sending it through, seeing what the output is, and copying it). It would be nicer if it worked a bit faster, though. I'll keep looking for a way to do this automatically.
EDIT: Ok, I think I figured out a workable solution.
I setup a simple textarea in a form with php that runs htmlentities() on it. Sending it through this way gives the desired results, so that will convert whatever I put in the box (including Japanese characters, which I can type, but not encode into the php document directly, as is) into html entities that work.
I would still like to know why this is a problem, and if there is any way to be sure it's the "right" html entity. But for now it works fine for my purposes.
09-27-2009, 10:35 AM
I would think that using the Unicode numbered (as opposed to named or hex or decimal numbered) entities would have the best shot at being preserved regardless of encoding. Generally if there is a named entity, that too will be treated the same in browsers regardless of encoding. I'd still stick with Unicode though, to be on the safe side. However, if the encoding is not the issue, then Unicode might not be the solution. Still, Unicode is the standard, so unless it causes problems, it is what should be used.
09-27-2009, 06:42 PM
I will try to use unicode, then, if possible.
I should do a bit more research about these different formats. It seems like just finding a number that works isn't enough, because it might just be a duplicate. (I get the impression that the lower numbers are more stable, and that for some reason groups of characters repeat at higher numbers.)
09-27-2009, 11:50 PM
Well, Unicode is generally the highest number. It goes like so, decimal and hex entities appear to be lower numbers because they require only two or three 'digits' to represent a given character in their respective bases. All Unicode entities employ 4 digits in base 16 (hex). Unicode is however more standard, so should be the best bet. This is not to say that your experience will not differ, only to state what should be the case, and what would most likely be the most future compatible and impervious to encoding. As I stated before, if encoding is not the issue, Unicode might not be the solution.
Powered by vBulletin® Version 4.2.0 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.