Results 1 to 9 of 9

Thread: htmlentities behaving differently after php 5.4 upgrade

  1. #1
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    2,385
    Thanks
    100
    Thanked 113 Times in 111 Posts

    Default htmlentities behaving differently after php 5.4 upgrade

    After upgrading to php 5.4.19 from 5.3.24 I see that I am having trouble using htmlentities and the like.

    Below is the example code I am playing with, but with no luck so far. Most characters can be expressed just fine, but when there are odd characters used like non standard single quotes or accented characters or arrows the output is empty. The output is empty by design in php for security reasons if I am understanding it correctly.

    What I want to do is express html entities as the actual character or as the html equivalent and express all other characters as is. Even partial solutions are fine as I can just play around with the code, but I am having a bit of trouble better understanding the flags and encoding used.

    Code:
    <?php
    $title="á";
    ##$title=htmlspecialchars($title);
    print $title;
    ?>
    <textarea name="summary" cols=75 rows=25><?php print htmlentities($title) ; ?></textarea>
    Last edited by james438; 11-21-2014 at 02:34 PM.
    To choose the lesser of two evils is still to choose evil. My personal site

  2. #2
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    30,495
    Thanks
    82
    Thanked 3,449 Times in 3,410 Posts
    Blog Entries
    12

    Default

    Have you checked the man (php.net) page for that function to see if there have been any changes as to its usage?

    Are you familiar with basic usage of flags?

    I don't have the later version of PHP, what are you getting? I'm getting the entity - but of course it looks exactly like the character unless I 'view source'.
    Last edited by jscheuer1; 11-21-2014 at 03:59 AM. Reason: add info
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

  3. #3
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    2,385
    Thanks
    100
    Thanked 113 Times in 111 Posts

    Default

    Thanks for looking into it. I did look at the main php site with particular interest at the differences added in php 5.4. I am learning about this kinda slowly, but it seems to have to do with the character set.

    I have discovered that the following works:

    Code:
    <?php
    $title="á";
    $title=htmlspecialchars($title, ENT_IGNORE, '');
    print $title;
    ?>
    <textarea name="summary" cols=75 rows=25><?php print htmlentities($title,ENT_IGNORE,'') ; ?></textarea>
    where the character set is unspecified.

    php.net has this to say about using an empty string, but I don't fully understand it.

    An empty string activates detection from script encoding (Zend multibyte), default_charset and current locale (see nl_langinfo() and setlocale()), in this order. Not recommended.
    I'm not fully sure what character set I am using, but I know it is either UTF-8 or ISO-8859-1. I think it matters, but I'm not sure how.

    Sadly, getting information on this has been slow, but I am making some progress.

    EDIT: I am definitely using the default which is UTF-8.
    Last edited by james438; 11-21-2014 at 05:15 AM.
    To choose the lesser of two evils is still to choose evil. My personal site

  4. #4
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    2,385
    Thanks
    100
    Thanked 113 Times in 111 Posts

    Default

    Code:
    $title=htmlspecialchars($title, ENT_IGNORE, 'ISO-8859-15');
    I seem to be somewhat wrong about the character set used. It registers as UTF-8 when I try to detect the character set, but in phpinfo() under exif.encode_unicode the one listed is ISO-8859-15.

    Next up is to find out what that means, why it was used, and if I should change it to the php 5.4 onwards standard of ISO-8859-1.
    To choose the lesser of two evils is still to choose evil. My personal site

  5. #5
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    30,495
    Thanks
    82
    Thanked 3,449 Times in 3,410 Posts
    Blog Entries
    12

    Default

    My reading of that cryptic quote is that it will use the default charset for the server when you use an empty string. If this is the encoding used by the server when serving the page or if the characters in question being converted overlap in the two charsets if two are involved, it will work out. Otherwise you must specify the encoding the page is served in to get the correct result.

    I know that's not much clearer, but I hope it is clear enough to be of some use.
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

  6. #6
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    2,385
    Thanks
    100
    Thanked 113 Times in 111 Posts

    Default

    I'm glad it is not just me that found their quote somewhat cryptic. php.net has a lot of great documentation, but some of their pages are just not well written. Still, as far as documentation goes php.net is probably my favorite.

    My reading of that cryptic quote is that it will use the default charset for the server when you use an empty string. If this is the encoding used by the server when serving the page or if the characters in question being converted overlap in the two charsets if two are involved, it will work out. Otherwise you must specify the encoding the page is served in to get the correct result.
    That does help out a bit.

    It looks like my hosting service decided to use ISO-8859-15 over ISO-8859-1 or UTF-8 because according to the description:
    ISO-8859-1 Western European, Latin-1.
    ISO-8859-15 Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1 (ISO-8859-1).
    UTF-8 ASCII compatible multi-byte 8-bit Unicode.
    So I think I will keep using ISO-8859-15 and start specifying the encoding used. I'll try to remember that my hosting service may change this in the future, but I don't think that will happen again anytime soon if it ever does.
    To choose the lesser of two evils is still to choose evil. My personal site

  7. #7
    Join Date
    Sep 2007
    Location
    The Netherlands
    Posts
    1,881
    Thanks
    49
    Thanked 266 Times in 258 Posts
    Blog Entries
    56

    Default

    Have you tried this:
    Code:
    <?php
    $title="é";
    ##$title=htmlspecialchars($title);
    print $title;
    ?>
    <textarea name="summary" cols=75 rows=25><?php print html_entity_decode($title);?></textarea>

  8. #8
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    2,385
    Thanks
    100
    Thanked 113 Times in 111 Posts

    Default

    That will work, but only under limited circumstances. Encoding any character that is non standard such as é or the &mdash; (—) will return an empty string.

    I wish php.net had more to say on exactly what is non standard. This is about all it has to say on it though.

    If the input string contains an invalid code unit sequence within the given encoding an empty string will be returned, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set.
    To choose the lesser of two evils is still to choose evil. My personal site

  9. #9
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    30,495
    Thanks
    82
    Thanked 3,449 Times in 3,410 Posts
    Blog Entries
    12

    Default

    For maximum utility in all but the most demanding spoken (well typed really, but what I mean is human languages as opposed to coding languages) languages, UTF-8 is the way to go.

    That is to say, for English, most Arabic, all Romance languages, most Oriental ones, many others, UTF-8 will work as long as the page is encoded in and served as UTF-8. Once you have that, if you then also use UTF-8 as the encoding for the htmlentities command, everything should work out. The only exception I can think of is if you're pulling a string from a database that's encoded in something other than UTF-8. There could be other exceptions.

    The bottom line is that if at all possible you should ensure that everything is encoded to and being told to use the same charset. Where that's not possible, one can almost always convert, but it gets tricky because you might not always know which encoding to which encoding is optimal for each specific point in your operations. Again - that's why it's optimal to use a single encoding for everything. If UTF-8 is not adequate, then use another, but use it for everything.

    I noticed earlier that you mentioned something about the euro and/or pound sign I think. As far as I know, both of those are supported in UTF-8. However, and as an example of how confusing things can become if more than one encoding is employed, the bits used to represent these two common monetary prefixes vary depending upon the encoding used to render them in.
    Last edited by jscheuer1; 11-22-2014 at 07:17 AM. Reason: typo
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

Similar Threads

  1. htmlentities( ENT_QUOTES)
    By rhodarose in forum PHP
    Replies: 1
    Last Post: 05-12-2011, 11:55 AM
  2. Replies: 1
    Last Post: 05-06-2011, 12:04 PM
  3. Useful Prototype: htmlEntities()
    By rainarts in forum JavaScript
    Replies: 1
    Last Post: 07-25-2008, 09:47 PM
  4. Replies: 0
    Last Post: 06-23-2008, 07:21 PM
  5. htmlentities function not working
    By jc_gmk in forum PHP
    Replies: 1
    Last Post: 11-01-2007, 05:32 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •