Page 2 of 2 FirstFirst 12
Results 11 to 17 of 17

Thread: UTF-8 with HTML5

  1. #11
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    30,475
    Thanks
    82
    Thanked 3,449 Times in 3,410 Posts
    Blog Entries
    12

    Default

    So it's just shades gray or perhaps a bit more as to the argument. We basically agree.

    I was under the impression though that the meaning of a word, unlike fiancée/fiancee, could change. My spellchecker BTW flags the latter as misspelled.

    As to the technical side, I don't think I know any more than you do. I just know that the people I was dealing with couldn't seem to get their characters on the screen in the browser without UTF-16.
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

  2. #12
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,156
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    I was under the impression though that the meaning of a word, unlike fiancée/fiancee, could change. My spellchecker BTW flags the latter as misspelled.
    Mine too. But in the case of diacritics (not Chinese characters), that's what it would be like-- misspelling because a mark is missing. And in the case of fiancée, it's a bit pretentious to spell it that way, directly borrowing from French. It's debatable whether that's "English" or not. I'd usually just write Fiancee, probably out of laziness more than anything. Regardless, even in French, they could read the word without that diacritic, but it would look a little off.
    I don't know much about Hebrew specifically, but Arabic has many many special characters that are almost never used but represent very specific things such as in the Qur'an. I'd suspect that you might have been working on a site related to the Hebrew Bible for that reason. Modern Hebrew doesn't, I don't think, require too much that is special. Like Arabic, it shouldn't be too hard to encode. (It's phonetic, nothing like Chinese characters).


    Yeah, I think we've reached a point of clarity.

    This is interesting for me though. I'm very interested in encoding these languages. And I know a lot about the languages themselves, but I don't know too much about the details of encoding them except that Unicode (unlike basically everything else) just seems to work.
    I'm building a language learning website intended to have text in any language potentially, so it's worth knowing if I need to move to UTF-16. I might.
    Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  3. #13
    Join Date
    Feb 2007
    Location
    🌎
    Posts
    528
    Thanks
    10
    Thanked 10 Times in 10 Posts
    Blog Entries
    2

    Default

    Quote Originally Posted by djr33 View Post
    it's worth knowing if I need to move to UTF-16.
    This, this, this, and this are all encoded in UTF-8, so I would assume that using UTF-8 wouldn't cause problems for Unicode support.

    Relevant:
    http://en.wikipedia.org/wiki/Compari...code_encodings
    http://en.wikipedia.org/wiki/UTF-8#A..._disadvantages
    ....(o_ Penguins
    .---/(o_- techno_racing
    +(---//\-' in
    .+(_)--(_)' The McMurdo 500

  4. #14
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    30,475
    Thanks
    82
    Thanked 3,449 Times in 3,410 Posts
    Blog Entries
    12

    Default

    I'm seeing what look like garbage characters in the second 'this':

    https://meta.wikimedia.org/wiki/List_of_Wikipedias

    Словѣ́ньскъ / ⰔⰎⰑⰂⰡⰐⰠⰔⰍⰟ (cu) · Deutsch
    BTW, what you were saying about this forum, it's encoded as ISO-8859-1 (windows-1252). So some chars are not supported.
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

  5. #15
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,156
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    John, the characters you can't see are probably because you don't have any fonts that support them. Very few people have support for all unicode characters (there's no need).
    Over the past couple years I've started accumulating fonts that support many of them, but I'm far from all of them. Oriya (an Indic language) was one of the hardest to find I remember, and not included in any other more general fonts, but it's also not very useful for most of the world, but of course important for anyone who wants to use Oriya.
    Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  6. #16
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    30,475
    Thanks
    82
    Thanked 3,449 Times in 3,410 Posts
    Blog Entries
    12

    Default

    I think that might be part of the point. Perhaps in UTF-16 it could show the literal character. I've no idea of that. But if it's true, one could see the appeal from the page's author's point of view.
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

  7. #17
    Join Date
    Feb 2007
    Location
    🌎
    Posts
    528
    Thanks
    10
    Thanked 10 Times in 10 Posts
    Blog Entries
    2

    Default

    Quote Originally Posted by jscheuer1 View Post
    I'm seeing what look like garbage characters in the second 'this'
    This doesn't seem to be an encoding-related problem, but a browser issue.

    Though it, at first glance, seems to me to be a font problem, I had three fonts installed that support the Glagolitic Unicode block (MPH 2B Damase, TITUS Cyberbit Basic, and Dilyana), and it appeared the same way for me, as well.

    Browsers (I tested in Firefox alpha and Chrome) don't seem to be automatically selecting a font for that block, as they do for other blocks.

    From a web development point of view, a solution for that would be embedding a font supporting that Unicode block via the @font-face CSS property, and calling it on span tags around characters in that block.

    The Old Church Slavonic Wikipedia is also encoded in UTF-8, and I see this:


    The span containing that has a style="font-family: Vikidemia, TITUS Cyberbit Basic, Bukyvede, Ja, Unicode5;" on it*, explicitly telling the browser what fonts to look for that may support the Glagolitic block.

    Not that supporting Old Church Slavonic is all that useful, anyway...

    * Yes, I know that that there should be single quotes around the TITUS Cyberbit Basic, but that's how it was in the source.

    Quote Originally Posted by jscheuer1 View Post
    BTW, what you were saying about this forum, it's encoded as ISO-8859-1 (windows-1252). So some chars are not supported.
    I've come to the point where I automatically assume that everything supports Unicode.

    Windows-1252 and ISO 8859-1 are different BTW; confusion there arose by the ignorance of someone at the W3C.
    ....(o_ Penguins
    .---/(o_- techno_racing
    +(---//\-' in
    .+(_)--(_)' The McMurdo 500

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •