Results 1 to 2 of 2

Thread: html tags

  1. #1
    Join Date
    May 2007
    Location
    Boston,ma
    Posts
    2,127
    Thanks
    173
    Thanked 207 Times in 205 Posts

    Default html tags

    I'm trying to use the preg_replace to find greater than and less than symbols that aren't part of an html tag. This works in dreamweaver when I tested it but on my page it doesn't work any ideas? Thanks.

    PHP Code:
        //Find less than and greater thans that arent part of an html tag
        
    $patterns[] = '/[^ol,li,p,ul,img,/p,/ol,/li,/ul,em,/em,strong,/strong,br,table,tbody,tr,td,/table,/td,/tbody,/tr,sup,sub,/sup,/sub,a,/a,class=",start="]>/i';
        
    $replacements[] = '>';
        
    $patterns[] = '/<[^ol,li,p,ul,img,/p,/ol,/li,/ul,em,/em,strong,/strong,br,table,tbody,tr,td,/table,/td,/tbody,/tr,sup,sub,/sup,/sub,a,/a]/i';
        
    $replacements[] = '&lt;';
        
    $contents preg_replace($patterns$replacements$input);
        echo 
    htmlspecialchars($contents); 
    Warning: preg_replace() [function.preg-replace]: Unknown modifier 'p' in /home/bluewalr/public_html/christophermacdonald.net/protocol.php on line 182
    The line 182 is the contents= preg_replace line, there are multiple things I'm swapping this is the only one that causes the errors though. When I say this works in dreamweaver I mean if I put the
    <[^ol,li,p,ul,img,/p,/ol,/li,/ul,em,/em,strong,/strong,br,table,tbody,tr,td,/table,/td,/tbody,/tr,sup,sub,/sup,/sub,a,/a]
    in for the search value it will find the less than symbol.
    Corrections to my coding/thoughts welcome.

  2. #2
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    2,385
    Thanks
    100
    Thanked 113 Times in 111 Posts

    Default

    Dunno how I missed this post. I enjoy these questions even though I am certainly no expert on the subject.

    Problem 1: The answer is in the error message. See how you are using the /i modifier? the "i" comes right after the delimiter, which in this case is the "/" symbol. The delimiter states where the pattern starts and where it ends. If you want to use the delimiter in your pattern you will need to escape it. To escape the delimiter you will need to insert the backslash right before it and pretty much anything that is not a letter or a number. What you need to do is insert the backslashes where you need to. Since you did not escape your forward slash the PCRE engine figured it was a delimiter and thought the p was a modifier and since "p" is not one of the listed delimiters it gave you an error message.

    Problem 2: You are trying to NOT match the HTML Tags. I can see that, but you placed them in a character class, so to use an abbreviated example: [^ol,li,p,ul,img,/p,/ol,/li,/ul] This will match everything except for commas, and the letters o,l,i,p,u,m,g,u,/. Instead you will need to put it into a subpattern (parentheses) as opposed to a character class (square backets). You also do not want to capture the HTML Tag, so we will say so by using a negative lookahead and a negative lookbehind. What we will now have is '/<(?!ol|li|p|ul|img|\/p|\/ol|\/li|\/ul)/i' and '/(?<!ol|li|p|ul|img|\/p|\/ol|\/li|\/ul)>/i'. Lookahead and lookbehind are also known as assertions. Assertions do not capture, so basically you are saying "capture the greater than symbol unless it is part of an HTML Tag". The signal for a negative lookahead is ?! and a negative lookbehind is ?<!, which state that the pattern you are looking for must not be proceeded by this pattern or preceded by this pattern respectively.

    Problem 3: I suspect that the tags you are unintentionally leaving out <hr> or the ending part of hyperlinked words whether it be a single quote or a double quote. There are probably others, but you may be doing this intentionally. I'm not sure what you are trying to do exactly.

    Anyway, try playing around with the following:
    PHP Code:
    <?php
        $patterns1 
    '/(?<!ol|li|p|ul|img|\/p|\/ol|\/li|\/ul|em|\/em|strong|\/strong|br|table|tbody|tr|td|\/table|\/td|\/tbody|\/tr|sup|sub|\/sup|\/sub|a|\/a|class=\"|start=\")>/i'
        
    $replacements1 '&gt;'
        
    $patterns2 '/<(?!ol|li|p|ul|img|\/p|\/ol|\/li|\/ul|em|\/em|strong|\/strong|br|table|tbody|tr|td|\/table|\/td|\/tbody|\/tr|sup|sub|\/sup|\/sub|a|\/a])/i'
        
    $replacements2 '&lt;';
    $string preg_replace($patterns1$replacements1$string);
    $string preg_replace($patterns2$replacements2$string);
        echo 
    htmlspecialchars($contents);
    ?>
    Sadly, I never got proficient in using arrays in PCRE which is why I am not using arrays in my finished product.

    I kinda skimmed over most everything, so if you want me to elaborate on anything I said just let me know.

    Side note: remember how "/" is used as a delimiter? You can use pretty much anything as a delimiter as long as it is not a letter, number, backslash, whitespace. For example, if you use the octothorpe (#) you won't have to worry about adding backslashes to your forward slashes.

    Important: What you are trying to do will capture the greater than and less than symbols and replace them with their html equivalent, which will look exactly the same, so even though the code now works you won't notice.
    Last edited by james438; 03-14-2010 at 06:29 PM.
    To choose the lesser of two evils is still to choose evil. My personal site

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •