Results 1 to 6 of 6

Thread: Standard URL/HTML entities for foreign characters

  1. #1
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default Standard URL/HTML entities for foreign characters

    I am creating a quiz for language learning. I'm using html entities for Japanese characters.

    I found some entities that worked fine in the HTML, but once submitted through the URL (via a form as an answer), they became another html entity that was equivalent to the same character, but not "equal" as far as PHP was concerned.

    Is there a way to figure out what is the "official" html code for certain characters? Does it vary by browser at all?
    Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  2. #2
    Join Date
    Sep 2006
    Location
    St. George, UT
    Posts
    2,769
    Thanks
    3
    Thanked 157 Times in 155 Posts

    Default

    I'm not 100% on this, but I would think it would have to do with the type of encoding used both on the HTML page as well as the PHP script (not sure how you would do this though).

    Hope this helps nonetheless.
    "Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms, munching magic pills and listening to repetitive electronic music." - Kristian Wilson, Nintendo, Inc, 1989
    TheUnlimitedHost | The Testing Site | Southern Utah Web Hosting and Design

  3. The Following User Says Thank You to thetestingsite For This Useful Post:

    djr33 (09-28-2009)

  4. #3
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    That's probably true, but I'm not sure about sending it through the URL as a get variable.

    I got it working at the moment just by trial and error (sending it through, seeing what the output is, and copying it). It would be nicer if it worked a bit faster, though. I'll keep looking for a way to do this automatically.

    EDIT: Ok, I think I figured out a workable solution.
    I setup a simple textarea in a form with php that runs htmlentities() on it. Sending it through this way gives the desired results, so that will convert whatever I put in the box (including Japanese characters, which I can type, but not encode into the php document directly, as is) into html entities that work.

    I would still like to know why this is a problem, and if there is any way to be sure it's the "right" html entity. But for now it works fine for my purposes.
    Last edited by djr33; 09-27-2009 at 02:37 AM.
    Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  5. #4
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    30,495
    Thanks
    82
    Thanked 3,449 Times in 3,410 Posts
    Blog Entries
    12

    Default

    I would think that using the Unicode numbered (as opposed to named or hex or decimal numbered) entities would have the best shot at being preserved regardless of encoding. Generally if there is a named entity, that too will be treated the same in browsers regardless of encoding. I'd still stick with Unicode though, to be on the safe side. However, if the encoding is not the issue, then Unicode might not be the solution. Still, Unicode is the standard, so unless it causes problems, it is what should be used.

    Also, if you are sending entities in a query and using javascript, you should encodeURIComponent them. If necessary they may be decodeURIComponent later.
    Last edited by jscheuer1; 09-27-2009 at 10:40 AM. Reason: add info
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

  6. The Following User Says Thank You to jscheuer1 For This Useful Post:

    djr33 (09-28-2009)

  7. #5
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    Thanks, John.
    At the moment I'm not using Javascript (hoping to avoid it, this is a fairly simple project, no need to make it more complex), unless it turns out I can't find a solution with the PHP alone.

    I will try to use unicode, then, if possible.

    I should do a bit more research about these different formats. It seems like just finding a number that works isn't enough, because it might just be a duplicate. (I get the impression that the lower numbers are more stable, and that for some reason groups of characters repeat at higher numbers.)
    Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  8. #6
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    30,495
    Thanks
    82
    Thanked 3,449 Times in 3,410 Posts
    Blog Entries
    12

    Default

    Well, Unicode is generally the highest number. It goes like so, decimal and hex entities appear to be lower numbers because they require only two or three 'digits' to represent a given character in their respective bases. All Unicode entities employ 4 digits in base 16 (hex). Unicode is however more standard, so should be the best bet. This is not to say that your experience will not differ, only to state what should be the case, and what would most likely be the most future compatible and impervious to encoding. As I stated before, if encoding is not the issue, Unicode might not be the solution.
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

  9. The Following User Says Thank You to jscheuer1 For This Useful Post:

    djr33 (09-28-2009)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •