View Full Version : a slightly better PCRE question
james438
08-13-2007, 09:37 PM
My goal is to get a modicum of skill with PCRE, however I am now taking a step back from PCRE as PHP has many built in functions that seem designed to deal with common yet simple PCRE commands.
Lets say you have the following:
$string="44,.:,:-,::227,229"
where you don't know what is between two different numbers, but only commas and dashes are accepted. Actually, I already have a command for that '/[^0-9\-,]/' which will make $string="44,,-,227,229"
The plan is to reduce the ",,-," or maybe ",-,,,,,:sd." to a "-" if the number of characters is greater than one and is between two digits (ungreedy) with only digits at the beginning and end of the string.
preg_replace(array('/[^\d,-]/', '/[^\d-]*(-)[^\d-]*/', '/^\D/', '/\D$/'), '$1', '44,.:,:-,::227,229');Oh, to stray from the topic slightly, I was looking at the preg_replace() documentation and you were right: converting a match to upper-case is possible. There's a /e modifier to PHP PCRE which causes the result of the replace to be executed as PHP code, so you could use strtoupper() or some such thing in there.
james438
08-14-2007, 12:41 AM
Very interesting script you posted. It works great too. I hate to leave you hanging, but I am afraid I have to study now :( A few things in the script that I found interesting was that you integrated an array into a preg statement, used a $1, and did not put the string into the preg statement, but I imagine that you can. I could be wrong though. Other than that, thanks for the PCRE command, snippet, code or whatever it is called. Not because it solves a problem i am working on, but because it looks like a lot of fun to play with. I am seeing a lot of new PCRE possibilities opening up for me with that script to play with :)
I'll try to read up more on it later though as I really must start to study for tonight. Thanks for the /e modifier tip though!
EDIT: I figured out how to add a string to the PCRE command. I was adding quotes to the string, but when I removed the quotes it behaved fine. The $1 seems to be a weird command to apply the 4 patterns you put into it to the string. You might have to explain how the $1 feature works though. This is kinda exciting, because I didn't know you could add multiple commands in one PCRE statement.
A few interesting things in the script that I found interesting was that you integrated an array into a preg statement, used a $1Information on both can be found in the documentation (http://www.php.net/preg-replace).
and did not put the string into the preg statementLook again -- it's the third argument.
EDIT: I figured out how to add a string to the PCRE command. I was adding quotes to the string, but when I removed the quotes it behaved fine.Hmm? I don't understand you. PHP strings require quotes (generally).
The $1 seems to be a weird command to apply the 4 patterns you put into it to the string.I used it as a shortcut so I could replace one of the patterns with -. Since there are no captures in the other patterns, it doesn't evaluate to anything for them.
james438
08-14-2007, 03:40 AM
EDIT: I figured out how to add a string to the PCRE command. I was adding quotes to the string, but when I removed the quotes it behaved fine.
Hmm? I don't understand you. PHP strings require quotes (generally).
I find the things interesting, because I am curious to learn the new tricks you showed me. I didn't think to look at the preg replace documentation. I was looking at the PCRE syntax page at php.net.
The following is the code I was talking about. heh, I am not sure what you mean about the string requiring quotes ;) I notice that sometimes they do and sometimes they don't. In the following example I could not get it to work unless I left the single and double quotes out. Don't even know what made me think to try it without quotes, but it worked.
$text="44,.:,:-,::227,229";
$text=preg_replace(array('/[^\d,-]/', '/[^\d-]*(-)[^\d-]*/', '/^\D/', '/\D$/'), '$1', $text);
echo "$text";
I want to play around with it a bit more and read up on the preg_replace() function in greater detail as well as the /e documentation :)
later
Oh, I see, you mean around the variables? Variables should never have quotes around them unless they're being interpolated -- "$text" will work but is a waste of resources, and '$text' is the literal string '$text' since no interpolation occurs.
james438
08-15-2007, 01:52 AM
I have a lot of stuff to go through as I research your expression and some of the concepts are a little confusing, but here is what I have thus far:
You added $1 so that the '/[^\d-]*(-)[^\d-]*/' would take effect.
You are using an array for two purposes. One is that you can use multiple patterns, which will all be deleted except in the case of '/[^\d-]*(-)[^\d-]*/' which says to find two digits where there is a dash between them delete everything between except the dash. This is where the $1 takes effect. (My description is probably close, but not complete). '/[^\d-]*(-)[^\d-]*/' is also an example of the only capture that is listed among the 4 patterns listed in the array. You can also replace the $1 with a 't' (just as an example) and it will replace all matches from all of the patterns with the 't'.
There are of course the other matches where the ends were cleaned up to be digits only and also to clean up the whole script so that only digits, commas, and dashes are in the string.
With arrays that are used in a preg_replace() the patterns are executed from left to right.
interpolated seems to mean expressed or calculated. heh, basic english, I know, but I am having trouble with it. ;)
My guess is that preg in preg_replace means perl regular expression.
What does 'Perl' stand for? What does PHP stand for?
What would it look like to try and match a pattern where you want to replace the characters between two digits with a -, but only if there were two or more as in the case of "2,--,,-4" or "2,,,4" or "2,-5" but not "2,4"? True, I could use another line:
$text=preg_replace('/[,]{2,}/','-',$text);but wouldn't it be better to have one preg_replace command as opposed to two?
Could you explain this in more detail? '/[^\d-]*(-)[^\d-]*/'. Currently I am reading up on the definition of captures and the correct usage of 'variables' and 'strings'.
As always, thanks for helping me to understand this difficult yet fun aspect of PHP. I do have a submit form where I am using this very code with maybe a few alterations here or there. If you don't mind I would like to add this to a tutorial/reference page (mostly for myself) where I explain many of the PCRE and/or string tricks that I have learned.
My guess is that preg in preg_replace means perl regular expression.I believe so. PCRE: Perl-Compatible Regular Expressions.
What does 'Perl' stand for? What does PHP stand for?Perl is another scripting language with a very powerful built-in regular expression syntax. It stands for Practical Extraction and Report Language, but it's been used for so much more for so long that practically everyone's forgotten the name's meaning.
What would it look like to try and match a pattern where you want to replace the characters between two digits with a -, but only if there were two or more as in the case of "2,--,,-4" or "2,,,4" or "2,-5" but not "2,4"? True, you can use another line:
$text=preg_replace('/[,]{2,}/','-',$text);but wouldn't it be better to have one preg_replace command as opposed to two?Slightly neater, I guess, but since you can't guarantee that there will be a - in there to capture I can't see any way but to use another statement. It's not a huge problem.
Could you explain this in more detail? '/[^\d-]*(-)[^\d-]*/'. Currently I am reading up on the definition of captures and the correct usage of 'variables' and 'strings'.\d is any digit (0-9). The - is used literally. [^xyz] means "anything except x, y or z," so [^\d-] means "anything except a digit or a hyphen." The * is used to repeat something any number of times. The brackets around the - capture it for later use in my shortcut.
james438
08-15-2007, 03:55 PM
Just wanted to say thanks for helping me advance my knowledge of PHP's PCRE feature. Now to work on my understanding of CSS. I still plan on playing with string functions and PCRE commands to iron out some of the details of what I have learned.
james438
08-18-2007, 05:35 AM
1. How is it that '/[^\d-]*(-)[^\d-]*/' is not greedy?
2. In the following 3-,-,,-,,,-,,,,-,,,,,-4,5 what is the last match? I guess I am still trying to wrap my mind around the 0 or more quantifier.
heh, sorry, I thought I was about done with this expression as well, but I found as I was writing up a description up for it on my site that I had a bit left to learn.
1. How is it that '/[^\d-]*(-)[^\d-]*/' is not greedy?It is greedy, but it can't match hyphens because they're included in the negative character class.
2. In the following 3-,-,,-,,,-,,,,-,,,,,-4,5 what is the last match? I guess I am still trying to wrap my mind around the 0 or more quantifier.For which pattern?
james438
08-18-2007, 08:06 AM
'/[^\d-]*(-)[^\d-]*/' for this one as applied to 3-,-,,-,,,-,,,,-,,,,,-4,5.
It is greedy, but it can't match hyphens because they're included in the negative character class. Sorry, dumb question, but if it is ungreedy, wouldn't it replace the 4 as well? The way I understand the above expression it will match everything between the first [^\d-]* and the last [^\d-]*.
The last match will be ",,,,,-".
Sorry, dumb question, but if it is ungreedy, wouldn't it replace the 4 as well?If it were ungreedy it definitely wouldn't. As it stands, it can't -- 4 matches one of the characters in the negative character class.
james438
08-18-2007, 08:29 AM
I misspoke (sp?) I meant greedy. Getting sleepy I am afraid. If it is greedy wouldn't it match 3-,-,,-,,,-,,,,-,,,,,-4,-5 and everything in between? or does the negative class automatically apply to that which is between the two matches.
I changed the example slightly.
EDIT: wait. Let me try something. I don't think I thought it through before speaking.
I am currently studying the difference between [].*[] and [].*?[] and [].*?[]? and [].*[]? Once I understand that I will know the answer to my own question.
does the negative class automatically apply to that which is between the two matches.You're looking at it the wrong way. * means repeat: everything contained therein is part of the match, and has to match the repeated expression. If you have /ba*b/ it's not the same as /ba.*ab/. It can't match ",-,,-,,,-,,,,-,,,,,-4,-" because that contains more than one hyphen (the hyphen isn't repeated in the expression; the hyphen in the negative character class exists for this purpose) and a digit, which it cannot match under any circumstances.
james438
08-18-2007, 09:22 AM
yeah... I know. I am kinda embarrassed at my questions really. I am going back to some of the basics of regexp again and practicing some basic examples for a bit till I can do this a little faster in my head. :o
I really just need to work on this on my own for a bit.
Powered by vBulletin® Version 4.2.2 Copyright © 2021 vBulletin Solutions, Inc. All rights reserved.