Page 2 of 2 FirstFirst 12
Results 11 to 13 of 13

Thread: Recursive regex pattern replace

  1. #11
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    PHP Code:
    <?php

    function textrepl($matches) {
      if (
    strtolower($matches[0])=='[/label]') {
        if (
    $GLOBALS['level']>0) { //valid level?
          
    $GLOBALS['level']--; //going down one level
          
    return '</span>';
        }
        else {
          return 
    $matches[0]; //default
        

      }
      else {
        
    $GLOBALS['level']++; //going up one level
        
    return '<span title="'.$matches[1].'">';
      }
    }

    $s '[label=A long sentence.]Una [label=sentence]frase[/label] lunga.[/label] [label=Goodbye.]Ciao.[/label]';

    $level 0;
    $s preg_replace_callback('/\[label=(.+)\]|\[\/label\]/Uiu''textrepl'$s);
    for(;
    $level>0;$level--) {
      
    $s .= '</span>';
    }

    echo 
    $s;

    ?>
    The generated HTML is:
    Code:
    <span title="A long sentence.">Una <span title="sentence">frase</span> lunga.</span> <span title="Goodbye.">Ciao.</span>
    This works.

    Note that my example is with translations: allowing layered titles on spans so you can see what something means. But this can be done in many ways and could generate any type of HTML while keeping the output valid-- making the tags properly line up.

    The logic is this: Use regex to match an opening OR closing tag, then use a function to handle the replacements. In this function, track what level the tags are at and if it can't go to a wider level (less than 0), then it must be a mistake so it ignores that close tag and outputs it as text. At the end of this it determines the final level and if it is not 0 it will add close tags until it is.
    This way the html output is valid and there's some very basic error correction: ignore extra close tags, and close unclosed open tags.

    The code is somewhat ugly and requires a lot of setup, so I might look at rewriting it as an anonymous function or perhaps creating one function that can be used in any such replacement.
    Then make a function that holds it so that the for loop at the end is also accounted for.

    Then it can be called just like parsetohtml($text,$tagname,$htmltagname), or something like that.


    But this works. The trick was figuring out how to count and using "or" is what allows for this.
    Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  2. #12
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    Hmm, here it is rewritten with an anonymous function. I'm not sure if this is simpler or more complex, but at least it puts everything in one place:
    PHP Code:
    <?php

    $s 
    '[label=A long sentence.]Una [label=sentence]frase[/label] lunga.[/label] [label=Goodbye.]Ciao.[/label]';

    $level 0;
    $s preg_replace_callback('/\[label=(.+)\]|\[\/label\]/Uiu',
    function (
    $matches) {
      if (
    strtolower($matches[0])=='[/label]') {
        if (
    $GLOBALS['level']>0) { //valid level?
          
    $GLOBALS['level']--; //going down one level
          
    return '</span>';
        }
        else {
          return 
    $matches[0]; //default
        

      }
      else {
        
    $GLOBALS['level']++; //going up one level
        
    return '<span title="'.$matches[1].'">';
      }
    },
    $s);
    for(;
    $level>0;$level--) {
      
    $s .= '</span>';
    }

    echo 
    $s;

    ?>
    That does exactly the same thing as above.
    Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  3. #13
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    I've now rewritten this as a function that takes 4 parameters: two pairs of search/replace "preg" style parameters, and of course the string... and also a "flags" parameter so you can add "caseless" etc. So there are 6 total.

    This works very well and it has a lot of options and seems organized, at least compared to the versions in my previous posts. However, it is still very specific and may have problems if someone's needs differ too greatly from mine-- basically if it's anything other than standard "tags" where there's an open and a close tag, and the close tag doesn't have any variable parts. It actually would work fine, I think, but it could get complex because of the for loop at the end. That is the only part that would (I think) cause problems.

    PHP Code:
    <?php

    //replace recursively requiring matching pairs, such as html tags
    function preg_replace_recursivepair($po,$pc,$ro,$rc,$s,$flags='') {
    //pattern open, pattern close, replace open, replace close, string, preg flags
      
    $level 0;
      
    $s preg_replace_callback('/'.$po.'|'.$pc.'/'.$flags,
      function (
    $matches) use (&$level,$po,$pc,$ro,$rc) {
        if (
    preg_match('/'.$po.'/'.$flags,$matches[0])==1) {
          
    $level++; //going up one level
          
    return preg_replace('/'.$po.'/'.$flags,$ro,$matches[0]);
        }
        else {
          if (
    $level>0) { //valid level?
            
    $level--; //going down one level
            
    return preg_replace('/'.$pc.'/'.$flags,$rc,$matches[0]);
          }
          else {
            return 
    $matches[0]; //no changes, invalid
          
    }
        }
      },
      
    $s);
      for(;
    $level>0;$level--) {
       
    $s .= $rc;
      }
      return 
    $s;
    }

    $s '[label=A long sentence.]Una [label=sentence]frase[/label] grande.[/label][/label] [label=Goodbye.]Adiós.';
    echo 
    preg_replace_recursivepair('\[label=(.+)\]','\[\/label\]','<span title="$1">','</span>',$s,'Uiu');

    ?>
    Note: I changed my translated text from Italian to Spanish now that I set my file's format to UTF8 and that works....


    I'd like some feedback on this and perhaps some ideas on standardizing it.


    And I still know there must be a simpler/normal way to do this...
    Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •