Results 1 to 2 of 2

Thread: regex - extract data from web page

  1. #1
    Join Date
    May 2009
    Posts
    11
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default regex - extract data from web page

    Hi,
    My html looks like this

    HTML Code:
    <meta name="description" content="New info! Code: [url]http://www.example/index.html[/url] Code: http://testing.com/fil" />
    <!-- message -->
          <div id="post_message_510223" class="vb_postbit"><font color="green"><font size="3">Temp</font></font><br />
    <br />
    <br />
    <img src="http://sample/test.jpg" border="0" alt="" onload="NcodeImageResizer.createOn(this);" /><br />
    <br />
    <br />
    info!<br />
    <br />
    
    <div style="margin:20px; margin-top:5px">
       <div class="smallfont" style="margin-bottom:2px">Code:</div>
       <pre class="alt2" dir="ltr" style="
          margin: 0px;
          padding: 6px;
          border: 1px inset;
          width: 470px;
          height: 34px;
          text-align: left;
          overflow: auto">[url]http://www.sample1.com/part1.html[/url]
          [url]http://www.sample1.com/part1.html[/url]
          http://www.sample1.com/part1.html</pre>
    </div><br />
    
    <div class="smallfont" style="margin-bottom:2px">Code:</div>
       <pre class="alt2" dir="ltr" style="
          margin: 0px;
          padding: 6px;
          border: 1px inset;
          width: 470px;
          height: 1490px;
          text-align: left;
          overflow: auto">[url]http://www.sample1.com/part1/sample_code.part01.rar[/url]
    http://www.sample1.com/part1/sample_code.part01.rar</pre>
    
    </div></div>

    I want all the values that are after Code:</div> and between pre tags.
    eg http://www.sample1.com/part1.html
    http://www.sample1.com/part1.html
    http://www.sample1.com/part1.html
    and
    http://www.sample1.com/part1/sample_code.part01.rar
    http://www.sample1.com/part1/sample_code.part01.rar

    Please note that at the start in meta tag there is also string Code: and I don't value from it.
    Thanks in advance
    Regards
    Last edited by Snookerman; 06-12-2009 at 03:54 PM.

  2. #2
    Join Date
    Jul 2006
    Posts
    497
    Thanks
    8
    Thanked 70 Times in 70 Posts

    Default

    Please try before asking next time. I can't speak for everyone else, but I hate the feeling that I'm operating like vending machine or printer. It might not be so bad if I were paid...
    http://www.regular-expressions.info/tutorial.html
    http://www.regular-expressions.info/php.html

    Try this. Note that I'm using % instead of / as the delimiter because slashes are in the regex. Also, I'm sacrificing a bit of efficiency so HTML can be inside the <pre> element. If you're sure it won't be, replace the .*? with [^<]*
    Code:
    $matches = array();
    preg_match_all('%Code:</div>[^<]*<pre[^>]*>(.*?)</pre>%', $html, $matches);
    foreach($matches[1] as $match){
        //...
    }
    -- Chris
    informal JavaScript student of Douglas Crockford
    I like wikis - a lot.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •