Log in

View Full Version :   and strict html validation



james438
08-23-2007, 09:37 PM
Hi. In short I am trying to validate my site. It is a long process. I am going from transitional to strict. I have a code that $summary to htmlentities, preg_replaces sets of two spaces into    and then I use html_entity_decode to change it back.

The problem is that the w3 validator won't validate it because of the   PHP.net has this note about using   that I don't quite understand
' ' entity is not ASCII code 32 (which is stripped by trim()) but ASCII code 160 (0xa0) in the default ISO 8859-1 characterset. html_entity_decode() (http://us2.php.net/manual/en/function.html-entity-decode.php)

The failure statement reads
Sorry, I am unable to validate this document because on line 1 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.

The error was: utf8 "\xA0" does not map to Unicode

Any ideas on what I can do? I would prefer to keep the   if I can, so that the browser doesn't condense sets of spaces when I don't want it to.

I am using
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >

EDIT: Sorry, I got it. Instead of using &nbsp; I am going to use & #160; (without the space between the & and #. The forum won't display the pattern I see.)&#160; problem solved. Now I am off to revalidating. This will certainly take a while.

boogyman
08-24-2007, 12:54 AM
either way it would be stripped...
you can use



htmlspecialchars($tring)
or
htmlentities($tring)

james438
08-24-2007, 04:56 PM
Maybe so, but it won't validate, which is why I changed to & #160;

james438
08-24-2007, 05:15 PM
Unfortunately another problem crops up when I convert to php5. The & #160; and & nbsp; coverts to those black diamonds with question marks on them. Any ideas? My best guess is that I need to specify a different character set, but I am not sure how to figure that out.

This is almost something for a different thread, but it seemed close enough that I just tacked it onto the end of this thread.

Twey
08-24-2007, 05:33 PM
In Unicode I believe it's &#00A0;.

james438
08-24-2007, 10:31 PM
hate to give you a partial solution to this, but I went back to my script and was having a bit of difficulty reproducing the error, but along the way I noticed that I could get rid of htmlentities($text) and the html_entity_decode() and it would validate again. That is when I noticed that the error was coming from html_entity_decode(). I don't really know why or what html_entity_decode was decoding &nbsp; into, but the script works now.