PDA

View Full Version : Quick and dirty way to produce valid HTML markup?



jscheuer1
07-16-2016, 01:25 AM
What do folks think about this method of taking an invalid string of HTML and making it into a valid one?


<?php
$dirty = "<div>some content</div></div></p><img src='bob.jpg'>
<div>Hello!</div></p><p></div>";
$x = new DOMDocument;
@$x->loadHTML($dirty);
$clean = @$x->saveHTML();
$str = preg_replace('/^.*<body>|<\/body>.*$/m', '', $clean);
echo implode("\n", array_slice(explode("\n", $str), 2));
?>

molendijk
07-21-2016, 10:02 PM
John, should this work when I put it inside a php-file? I can't get it to work.

jscheuer1
07-22-2016, 02:29 AM
PHP 5+ required. I've made some refinements** since, but yes, as long as PHP's DOMDocument class is supported, it should more or less work.

**said refinements (most of them at least*):


<?php
header('Content-Type: text/html; charset=utf-8'); // most often wise - adjust if necessary
$dirty = "<div>some content©</div></div></p><img src='bob.jpg'>
<div>Hello! €</div></p><p></div>";
$x = new DOMDocument;
@$x->loadHTML($dirty);
$clean = $x->saveHTML();
$str = implode("\n", array_slice(explode("\n", $clean), 1));
echo trim(preg_replace('/^.*<body>|<\/body>.*$/m', '', $str));
?>

*some refinement may be required for any particular purpose. If tidy:

http://php.net/manual/en/class.tidy.php

is available, use that. I'm interested in this approach (the one without tidy) because I'm working on code that already requires PHP 5.2 and that often will not be run in environments where tidy is available.

Demo of the code in this post:

http://john.dynamicdrive.com/demos/tidbits/fix_html.php

Use the browser's view source and compare to the $dirty string.

molendijk
07-22-2016, 09:55 AM
PHP 5+ required.
Thanks.