Log in

View Full Version : PREC Modifers



bluewalrus
12-17-2010, 08:27 PM
So I've been using the modifers (http://php.net/manual/en/reference.pcre.pattern.modifiers.php) I thought correctly but have run into a few problems recently.

I had this


$code_is = preg_replace('/<li><p.*?>(.*?)<\/p>.*?<\/li>/s', "<li>$1</li>", $code_is);

I then tried


$code_is = preg_replace('/<li><p.*?>(.*?)<\/p>.*?<\/li>/m', "<li>$1</li>", $code_is);

and


$code_is = preg_replace('/<li><p.*?>(.*?)<\/p>.*?<\/li>/sm', "<li>$1</li>", $code_is);

None of which worked as I intended. I'm trying to replace any line item with a paragraph in it there are new lines and tabs in there. None of the above solutions work, the below solution somewhat works but not for all occurances so I figure there must be a better way or modifer.


$code_is = preg_replace('/<li>[.|\n|\r|\t]*?<p.*?>(.*?)<\/p>[.|\n|\r|\t]*?<\/li>/s', "<li>$1</li>", $code_is);

james438
12-17-2010, 09:36 PM
sometimes newlines are expressed as \r\n as well. When you say that it is not working I am assuming that you mean it is not matching. I have not worked with the multiline modifier much, but it applies to the use of ^ and $ commands, which you are not using, so I wouldn't worry about it with your situation.

/s tells the dot to recognize newlines, so I would keep using it in your situation. Using the dot in [.|\n|\r|\t] makes the \n|\r|\t parts meaningless since the dot will match anything for one character. You may as well replace [.|\n|\r|\t] with ".".

Not promising anything, but could you post an example of what you want matched and what it should look like after it has been processed?

Just a typo, but your title says PREC when it should be PCRE ;)

later

bluewalrus
12-17-2010, 09:47 PM
Yes, it's not matching I think because of the new lines/tabs I'd like it to just completely ignore any white spaces (tabs, returns, enters, spaces, [&nbsp;, &#160;(if possible) ]).


<li>
<p>blah blah blah random text
</p></li>

Replaced as


<li>blah blah blah random text</li>

djr33
12-17-2010, 09:50 PM
You could pre-process the text by removing all of those with a simple str_replace(). Then you wouldn't need to worry about it. I'm not sure if that would apply to your project though.

bluewalrus
12-17-2010, 10:02 PM
How would I do it with str_replace? That examples is 3 lines of a larger string. I'd want to keep tabs, new lines, etc. if they aren't inside an li.

I guess a better sample would be:


<p>Some stuff up here</p>

<ol>
<li>1</li>
<li>2</li>
<li>
<p>blah blah blah random text
</p></li>
</ol>

<img src="a.jpg" alt="a" />

<a href="link.html">link text</a>


The formatting of this would be


<p>Some stuff up here</p>

<ol>
<li>1</li>
<li>2</li>
<li>blah blah blah random text</li>
</ol>

<img src="a.jpg" alt="a" />

<a href="link.html">link text</a>

djr33
12-17-2010, 10:07 PM
Won't work well then. My idea was more for a full HTML (or XML) parser: first strip all white space (since it is irrelevant) then process the tags. But if you are going to need the formatted source code, there's no way around that.

james438
12-18-2010, 12:05 AM
Why not just use this?


<?php
$code_is=" <p>Some stuff up here</p>

<ol>
<li>1</li>
<li>2</li>
<li>
<p>blah blah blah random text
</p></li>
</ol>

<img src=\"a.jpg\" alt=\"a\" />

<a href=\"link.html\">link text</a>";
$code_is=preg_replace('/<li>\s{1,}(<p>|\b)/s','<li>',$code_is);
$code_is=preg_replace('/(\s)+(<\/p>)?<\/li>/s',"</li>",$code_is);
?>
<textarea cols="100" rows="50"><?php print htmlentities($code_is);?></textarea>

"+" means one or more, which is the same as "{1,}".
"?" means zero or one, which is another way of saying that it may be present at this location and it might not.

james438
12-18-2010, 12:15 AM
Actually, this is better:


$code_is=preg_replace('/<li>\s+(<p>|\b)/s','<li>',$code_is);
$code_is=preg_replace('/\s*?<\/p>?<\/li>/s',"</li>",$code_is);

EDIT: even better:


$code_is=preg_replace('/<li>\s+(<p>|\b)/s','<li>',$code_is);
$code_is=preg_replace('/\s*?<\/p>?\s*?<\/li>/s','</li>',$code_is);

james438
12-18-2010, 06:11 AM
If this answers your question, please mark this thread as resolved.

bluewalrus
12-20-2010, 12:42 AM
Been busy haven't had a chance to try it quite yet. What does the \b do?

james438
12-20-2010, 01:33 AM
good question. \b is a bit harder to understand; at least for me it was. \b is the boundary between a word character and a non word character. For example

$test='####tttt';
$test=preg_replace('/\b/','correct',$test);
echo "$test";

// this produces: ####correcttttt

I wasn't sure how to do what you wanted without \b, because it would have failed under certain circumstances.

bluewalrus
12-28-2010, 08:32 PM
Thanks, yes that could did work I finally got around to finishing my word/open office/ckeditor convertor.

I don't have a mark as resolved option anymore though I guess maybe an admin will mark it for me.

bluewalrus
12-29-2010, 07:18 PM
I've returned with a varied question (this one is to remove duplicating tags). I've got this code now


$code_is = preg_replace('/<([a-z,A-Z].*?)>\s+<\1>/s', " <$1>", $code_is);
//initially had $code_is = preg_replace('/<(.*?)>\s+<\1>/s', " <$1>", $code_is);
$code_is = preg_replace('/<(\/.*?)><\1>/s', "<$1> ", $code_is);


which is suppose to filter something like


<em> <em>et al.</em></em>

to


<em>et al.</em>

Well it works there it is deleting <td> tags entered like this


<td>&nbsp;</td>
<td>&nbsp;</td>


which becomes


<td>&nbsp;</td>

james438
12-29-2010, 10:16 PM
We should probably try and get a moderator to split this thread.

Anyway


<?php
$code_is="<em> <em>et al.</em></em>
<td>&nbsp;</td>
<td>&nbsp;</td>";
$code_is = preg_replace('/<([a-zA-Z].*?)>\s*<\1>/s', " <$1>", $code_is);
$code_is = preg_replace('/<(\/.*?)><\1>/s', "<$1> ", $code_is);
echo $code_is;
?>

I didn't look too closely into why it wasn't working. All I did was fix a few of the more obvious errors and then tried it out again and it seems to be working.

EDIT: This is a little better though:


$code_is = preg_replace('/<([a-zA-Z].*?)>\s*<\1>/si', " <$1>", $code_is);
$code_is = preg_replace('/<(\/.*?)><\1>/si', "<$1> ", $code_is);