Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: a slightly better PCRE question

  1. #1
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    2,385
    Thanks
    100
    Thanked 113 Times in 111 Posts

    Default a slightly better PCRE question

    My goal is to get a modicum of skill with PCRE, however I am now taking a step back from PCRE as PHP has many built in functions that seem designed to deal with common yet simple PCRE commands.

    Lets say you have the following:

    Code:
    $string="44,.:,:-,::227,229"
    where you don't know what is between two different numbers, but only commas and dashes are accepted. Actually, I already have a command for that '/[^0-9\-,]/' which will make $string="44,,-,227,229"

    The plan is to reduce the ",,-," or maybe ",-,,,,,:sd." to a "-" if the number of characters is greater than one and is between two digits (ungreedy) with only digits at the beginning and end of the string.

  2. #2
    Join Date
    Jun 2005
    Location
    英国
    Posts
    11,876
    Thanks
    1
    Thanked 180 Times in 172 Posts
    Blog Entries
    2

    Default

    Code:
    preg_replace(array('/[^\d,-]/', '/[^\d-]*(-)[^\d-]*/', '/^\D/', '/\D$/'), '$1', '44,.:,:-,::227,229');
    Oh, to stray from the topic slightly, I was looking at the preg_replace() documentation and you were right: converting a match to upper-case is possible. There's a /e modifier to PHP PCRE which causes the result of the replace to be executed as PHP code, so you could use strtoupper() or some such thing in there.
    Twey | I understand English | 日本語が分かります | mi jimpe fi le jbobau | mi esperanton komprenas | je comprends français | entiendo español | tôi ít hiểu tiếng Việt | ich verstehe ein bisschen Deutsch | beware XHTML | common coding mistakes | tutorials | various stuff | argh PHP!

  3. #3
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    2,385
    Thanks
    100
    Thanked 113 Times in 111 Posts

    Default

    Very interesting script you posted. It works great too. I hate to leave you hanging, but I am afraid I have to study now A few things in the script that I found interesting was that you integrated an array into a preg statement, used a $1, and did not put the string into the preg statement, but I imagine that you can. I could be wrong though. Other than that, thanks for the PCRE command, snippet, code or whatever it is called. Not because it solves a problem i am working on, but because it looks like a lot of fun to play with. I am seeing a lot of new PCRE possibilities opening up for me with that script to play with

    I'll try to read up more on it later though as I really must start to study for tonight. Thanks for the /e modifier tip though!

    EDIT: I figured out how to add a string to the PCRE command. I was adding quotes to the string, but when I removed the quotes it behaved fine. The $1 seems to be a weird command to apply the 4 patterns you put into it to the string. You might have to explain how the $1 feature works though. This is kinda exciting, because I didn't know you could add multiple commands in one PCRE statement.
    Last edited by james438; 08-14-2007 at 03:27 AM. Reason: for grammar

  4. #4
    Join Date
    Jun 2005
    Location
    英国
    Posts
    11,876
    Thanks
    1
    Thanked 180 Times in 172 Posts
    Blog Entries
    2

    Default

    A few interesting things in the script that I found interesting was that you integrated an array into a preg statement, used a $1
    Information on both can be found in the documentation.
    and did not put the string into the preg statement
    Look again -- it's the third argument.
    EDIT: I figured out how to add a string to the PCRE command. I was adding quotes to the string, but when I removed the quotes it behaved fine.
    Hmm? I don't understand you. PHP strings require quotes (generally).
    The $1 seems to be a weird command to apply the 4 patterns you put into it to the string.
    I used it as a shortcut so I could replace one of the patterns with -. Since there are no captures in the other patterns, it doesn't evaluate to anything for them.
    Twey | I understand English | 日本語が分かります | mi jimpe fi le jbobau | mi esperanton komprenas | je comprends français | entiendo español | tôi ít hiểu tiếng Việt | ich verstehe ein bisschen Deutsch | beware XHTML | common coding mistakes | tutorials | various stuff | argh PHP!

  5. #5
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    2,385
    Thanks
    100
    Thanked 113 Times in 111 Posts

    Default

    EDIT: I figured out how to add a string to the PCRE command. I was adding quotes to the string, but when I removed the quotes it behaved fine.

    Hmm? I don't understand you. PHP strings require quotes (generally).
    I find the things interesting, because I am curious to learn the new tricks you showed me. I didn't think to look at the preg replace documentation. I was looking at the PCRE syntax page at php.net.

    The following is the code I was talking about. heh, I am not sure what you mean about the string requiring quotes I notice that sometimes they do and sometimes they don't. In the following example I could not get it to work unless I left the single and double quotes out. Don't even know what made me think to try it without quotes, but it worked.

    Code:
    $text="44,.:,:-,::227,229";
    $text=preg_replace(array('/[^\d,-]/', '/[^\d-]*(-)[^\d-]*/', '/^\D/', '/\D$/'), '$1', $text);
    echo "$text";
    I want to play around with it a bit more and read up on the preg_replace() function in greater detail as well as the /e documentation

    later
    Last edited by james438; 08-15-2007 at 01:47 AM. Reason: clarity

  6. #6
    Join Date
    Jun 2005
    Location
    英国
    Posts
    11,876
    Thanks
    1
    Thanked 180 Times in 172 Posts
    Blog Entries
    2

    Default

    Oh, I see, you mean around the variables? Variables should never have quotes around them unless they're being interpolated -- "$text" will work but is a waste of resources, and '$text' is the literal string '$text' since no interpolation occurs.
    Twey | I understand English | 日本語が分かります | mi jimpe fi le jbobau | mi esperanton komprenas | je comprends français | entiendo español | tôi ít hiểu tiếng Việt | ich verstehe ein bisschen Deutsch | beware XHTML | common coding mistakes | tutorials | various stuff | argh PHP!

  7. #7
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    2,385
    Thanks
    100
    Thanked 113 Times in 111 Posts

    Default

    I have a lot of stuff to go through as I research your expression and some of the concepts are a little confusing, but here is what I have thus far:

    You added $1 so that the '/[^\d-]*(-)[^\d-]*/' would take effect.

    You are using an array for two purposes. One is that you can use multiple patterns, which will all be deleted except in the case of '/[^\d-]*(-)[^\d-]*/' which says to find two digits where there is a dash between them delete everything between except the dash. This is where the $1 takes effect. (My description is probably close, but not complete). '/[^\d-]*(-)[^\d-]*/' is also an example of the only capture that is listed among the 4 patterns listed in the array. You can also replace the $1 with a 't' (just as an example) and it will replace all matches from all of the patterns with the 't'.

    There are of course the other matches where the ends were cleaned up to be digits only and also to clean up the whole script so that only digits, commas, and dashes are in the string.

    With arrays that are used in a preg_replace() the patterns are executed from left to right.

    interpolated seems to mean expressed or calculated. heh, basic english, I know, but I am having trouble with it.
    My guess is that preg in preg_replace means perl regular expression.

    What does 'Perl' stand for? What does PHP stand for?

    What would it look like to try and match a pattern where you want to replace the characters between two digits with a -, but only if there were two or more as in the case of "2,--,,-4" or "2,,,4" or "2,-5" but not "2,4"? True, I could use another line:
    Code:
    $text=preg_replace('/[,]{2,}/','-',$text);
    but wouldn't it be better to have one preg_replace command as opposed to two?

    Could you explain this in more detail? '/[^\d-]*(-)[^\d-]*/'. Currently I am reading up on the definition of captures and the correct usage of 'variables' and 'strings'.

    As always, thanks for helping me to understand this difficult yet fun aspect of PHP. I do have a submit form where I am using this very code with maybe a few alterations here or there. If you don't mind I would like to add this to a tutorial/reference page (mostly for myself) where I explain many of the PCRE and/or string tricks that I have learned.
    Last edited by james438; 08-15-2007 at 03:24 PM. Reason: typos and for clarity

  8. #8
    Join Date
    Jun 2005
    Location
    英国
    Posts
    11,876
    Thanks
    1
    Thanked 180 Times in 172 Posts
    Blog Entries
    2

    Default

    My guess is that preg in preg_replace means perl regular expression.
    I believe so. PCRE: Perl-Compatible Regular Expressions.
    What does 'Perl' stand for? What does PHP stand for?
    Perl is another scripting language with a very powerful built-in regular expression syntax. It stands for Practical Extraction and Report Language, but it's been used for so much more for so long that practically everyone's forgotten the name's meaning.
    What would it look like to try and match a pattern where you want to replace the characters between two digits with a -, but only if there were two or more as in the case of "2,--,,-4" or "2,,,4" or "2,-5" but not "2,4"? True, you can use another line:
    Code:
    $text=preg_replace('/[,]{2,}/','-',$text);
    but wouldn't it be better to have one preg_replace command as opposed to two?
    Slightly neater, I guess, but since you can't guarantee that there will be a - in there to capture I can't see any way but to use another statement. It's not a huge problem.
    Could you explain this in more detail? '/[^\d-]*(-)[^\d-]*/'. Currently I am reading up on the definition of captures and the correct usage of 'variables' and 'strings'.
    \d is any digit (0-9). The - is used literally. [^xyz] means "anything except x, y or z," so [^\d-] means "anything except a digit or a hyphen." The * is used to repeat something any number of times. The brackets around the - capture it for later use in my shortcut.
    Twey | I understand English | 日本語が分かります | mi jimpe fi le jbobau | mi esperanton komprenas | je comprends français | entiendo español | tôi ít hiểu tiếng Việt | ich verstehe ein bisschen Deutsch | beware XHTML | common coding mistakes | tutorials | various stuff | argh PHP!

  9. #9
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    2,385
    Thanks
    100
    Thanked 113 Times in 111 Posts

    Default

    Just wanted to say thanks for helping me advance my knowledge of PHP's PCRE feature. Now to work on my understanding of CSS. I still plan on playing with string functions and PCRE commands to iron out some of the details of what I have learned.

  10. #10
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    2,385
    Thanks
    100
    Thanked 113 Times in 111 Posts

    Default

    1. How is it that '/[^\d-]*(-)[^\d-]*/' is not greedy?
    2. In the following 3-,-,,-,,,-,,,,-,,,,,-4,5 what is the last match? I guess I am still trying to wrap my mind around the 0 or more quantifier.

    heh, sorry, I thought I was about done with this expression as well, but I found as I was writing up a description up for it on my site that I had a bit left to learn.
    Last edited by james438; 08-18-2007 at 06:05 AM.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •