Log in

View Full Version : Regex - Stripping content from comments in html



benslayton
07-19-2008, 10:10 AM
ok im totally confused with this regex stuff. Im not really finding a whole lot of places that teach you regex.

But anyways here is what I am trying to accomplish.
I have over 100 .html pages that need content stipped from it and stored into variables so I can manipulate it with php.

There are comments in these .html files that are like this

<!-- publish type="textbox" name="About Us Content" -->Stuff I need Stripped<!-- /publish -->

I need the stuff in between that saved as a variable.

All the pages have the same amount of comment tags and these are the exact ones in order:


<!-- publish type="textbox" name="About Us Content" -->
About Us Stuff That I Need
<!-- /publish -->

<!-- publish type="textbox" name="Calendar Content" -->
Calendar Stuff That I Need
<!-- /publish -->

<!-- publish type="textbox" name="News Content" -->
News Stuff That I Need
<!-- /publish -->


Here is the php code that I currently have right now. I dont know what to put in the $regex variable.

<?php
$url = "http://localhost/pathtofile/etc.htm";
$html = file_get_contents($url);

$regex = "";
preg_match_all($regex, $html, $matches);

foreach($matches[0] as $div) {
echo $div;
}
?>

It would be really cool if someone could help out..

benslayton
07-19-2008, 10:38 AM
Ok so I have solved it . Here is what I cam up with.

<?php
$url = "http://www.localhost/html.html";
$html = file_get_contents($url);


$change1 = str_replace("<!-- publish", "<publish", $html);
$change2 = str_replace("<!-- /publish -->", "</publish>", $change1);
$change3 = strip_tags($change2, '<publish>');
$change4 = str_replace("-->", ">", $change3);

preg_match_all('|<publish[^>]*?>(.*?)</publish>|si',$change4,$matches);

echo "<textarea name=\"\" cols=\"115\" rows=\"30\">" . $matches . "</textarea>";

print_r($matches);
?>