Advanced Search

Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: Removing extra <tr><td><table> and </tr></td></table> with preg_replace

  1. #1
    Join Date
    Oct 2012
    Posts
    130
    Thanks
    15
    Thanked 1 Time in 1 Post

    Default Removing extra <tr><td><table> and </tr></td></table> with preg_replace

    Hi there!

    Sometimes people post ads with html tables in them and more often than not they have extra tags <tr><td><table> and </tr></td></table> which interferes with my site's layout.

    Is it possible to strip all EXTRA <tr><td><table> and </tr></td></table> with preg_replace?


    Thank you for any input.

  2. #2
    Join Date
    Apr 2008
    Location
    So.Cal
    Posts
    3,629
    Thanks
    63
    Thanked 516 Times in 502 Posts
    Blog Entries
    5

    Default

    You might look at HTMLpurifier. It's a good idea "anyway" if you're going to allow users to post html.
    We Only Torture the Folks We Don't Like (You're Probably Gonna Be Okay)
    It's a Party in the CIA

  3. #3
    Join Date
    Oct 2012
    Posts
    130
    Thanks
    15
    Thanked 1 Time in 1 Post

    Default

    Ok, I downloaded it, but it's all too complex for me. I just need a simple preg_replace for just this (stripping extra <tr> <td> <table> and </tr> </td> </table>) function. Any ideas? The thing is I don't have any issues with anything else, just tables.

  4. #4
    Join Date
    Oct 2012
    Posts
    130
    Thanks
    15
    Thanked 1 Time in 1 Post

    Default

    The function should go like this. First, it counts all open and closed tags and if there are extra tags, they are removed.

    For instance:

    Code:
    <table border="0">
    <tr>
    <td>
    
    Some text
    
    </td>
    <td>
    
    Some text
    
    </td>
    </tr>
    </table> <tr> <td> <table>
    The last <tr> <td> <table> are extra and should be removed since there's no closing </tr> </td> </table> tag. I think it's feasible. No?

  5. #5
    Join Date
    Apr 2008
    Location
    So.Cal
    Posts
    3,629
    Thanks
    63
    Thanked 516 Times in 502 Posts
    Blog Entries
    5

    Default

    It's more complicated than you think.

    The best way to do this is by parsing/tokenizing it - that means DomDocument or similar, and HTMLpurifier is far easier than that. The preg_match function is easier, but the regex you'd need would get very complicated, and you'd still run the risk of making things worse by accident.
    We Only Torture the Folks We Don't Like (You're Probably Gonna Be Okay)
    It's a Party in the CIA

  6. #6
    Join Date
    Oct 2012
    Posts
    130
    Thanks
    15
    Thanked 1 Time in 1 Post

    Default

    I see. Thanks for explaining.

  7. #7
    Join Date
    Oct 2012
    Posts
    130
    Thanks
    15
    Thanked 1 Time in 1 Post

    Default

    Quote Originally Posted by traq View Post
    It's more complicated than you think.

    The best way to do this is by parsing/tokenizing it - that means DomDocument or similar, and HTMLpurifier is far easier than that. The preg_match function is easier, but the regex you'd need would get very complicated, and you'd still run the risk of making things worse by accident.
    I've looked into this issue again and what I actually need is to strip all extra </table> tags. Just those. When extra closed table tags are gone everything seems to be formed ok. Will it make making a preg_match or regex script easier?
    Last edited by qwikad.com; 05-24-2013 at 04:45 AM.

  8. #8
    Join Date
    Apr 2008
    Location
    So.Cal
    Posts
    3,629
    Thanks
    63
    Thanked 516 Times in 502 Posts
    Blog Entries
    5

    Default

    You'd still have to count them (both opening and closing tags) and make sure they're in the right order. That means lookaheads and sub-pattern matching - no, it's not any less complicated. In fact, once you implement it for one kind of tag, it's not really much more work to do it for all of them.

    Also consider that extra <table> tags aren't the only thing that can ruin your markup; and a ruined layout isn't the only risk of allowing users to input HTML. I *highly* recommend using HTMLpurifier if you allow user-submitted HTML, if only for the security benefits.

    You might ask at RegexAdvice if you really want to pursue a preg_match solution.
    We Only Torture the Folks We Don't Like (You're Probably Gonna Be Okay)
    It's a Party in the CIA

  9. #9
    Join Date
    Oct 2012
    Posts
    130
    Thanks
    15
    Thanked 1 Time in 1 Post

    Default

    Quote Originally Posted by traq View Post
    I *highly* recommend using HTMLpurifier if you allow user-submitted HTML, if only for the security benefits.
    I AM putting security in place. I am "training" my markdown to filter out anything that can launch an attack or anything that can be used to take advantage of the site. I never thought I'd have the issue with tables. I am seriously considering just stripping them all if I can't resolve this thing. Why do you think it is so hard to resolve something so simple? I've seen preg_match or preg_replace scripts that do AMAZING and complicated things. And here, all I need is for a script to remove extra open </table> tags - and yet it is such a hassle? I posted this same question on two other forums and everybody seems to be having a "just forget it!" type of attitude.... It's kinda frustrating.

  10. #10
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,156
    Thanks
    262
    Thanked 690 Times in 678 Posts

    Default

    Quote Originally Posted by traq
    You'd still have to count them (both opening and closing tags) and make sure they're in the right order.
    That's what I was going to say, in response to the latest post here. It's not an easy task. It's possible-- browsers manage to do this. But you'd need to fully parse the HTML of the page. One option would be to limit the scope of what you're doing to something like single (or dual) level tables, so that you only have one table (or two) at most, and you don't need to worry so much about subpatterns, but this really is complicated.

    The issue isn't that preg_match can't do this relatively well, but that to get a perfect script (with zero exceptions) it would be incredibly complicated-- as I said, you'd have to parse all of the HTML on the page to be certain nothing conflicts.

    So your options:
    1. Do nothing.
    2. Fully parse all of the HTML.
    3. Simplify the parameters (such as not allowed embedded tables).
    4. Settle for an imperfect (but generally working) solution that covers maybe 75-95% of the possible problems, depending on how you write it.



    I am "training" my markdown to filter out anything that can launch an attack or anything that can be used to take advantage of the site.
    The problem here is the difference between a whitelist and a blacklist. If you use a whitelist, then you will only allow those things that are approved and are known to cause no problems (while blocking harmful and harmless other things). If you use a blacklist, as you are suggesting, then it will block all known bad things (while letting everything else-- good or bad) through; the problem with that is that if you just don't know about something (or some new hacking technique is invented) then you will have no defenses at all. There *are* ways to create a working blacklist by overdenying possibly good code, such as removing all HTML, but it doesn't sound like that's what you want either.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

Similar Threads

  1. DHTML Window widget - removing inset table border?
    By jamba in forum Dynamic Drive scripts help
    Replies: 2
    Last Post: 02-21-2012, 05:51 AM
  2. DD Window Widget - adds extra space in window before <table> or <ul> tags
    By CherieP in forum Dynamic Drive scripts help
    Replies: 1
    Last Post: 01-20-2010, 08:48 AM
  3. Replies: 4
    Last Post: 01-18-2010, 01:42 PM
  4. table with extra bit on bottom
    By djr33 in forum HTML
    Replies: 4
    Last Post: 03-02-2007, 04:22 AM
  5. copy records from table to table in MySQL?
    By nephish in forum Other
    Replies: 0
    Last Post: 07-29-2005, 07:40 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •