Results 1 to 7 of 7

Thread: Links RegExp

  1. #1
    Join Date
    Jul 2006
    Location
    Canada
    Posts
    2,581
    Thanks
    13
    Thanked 28 Times in 28 Posts

    Default Links RegExp

    I'm trying to make a regular expression which will filter links on a page. I've tried to figure this out, but can't:
    Code:
    var content = "";
    onload=function() {
    var reg = /(<a ).*(<\/a>)/gi
    var str = document.body.innerHTML;
    if (reg.test(str)) {
    	var match = str.match(reg);
    	for (i=0;i<match.length;i++) {
    		content+="\n<br>"+match[i];
    		}
    	}
    }
    Say if I had:
    Code:
    <div id="links"></div>
    <br><input type="button" value="grablinks" onclick="links.innerHTML=content;alert(content)">
    <div>
    <br><a href="#">LInk 1</a>
    <a href="#" onclick="alert(content)">my link</a>
    </div>
    <ul>
    <li><a href="http://www.google.ca">google</a></li>
    <li><a href="#">LInk 4</a></li>
    <li><a href="#">LInk 3</a></li>
    </ul>
    <a href="#">m ylin ek</a>
    On my page, sometimes it would pick up the </li> as well.
    - Mike

  2. #2
    Join Date
    Aug 2005
    Location
    Other Side of My Monitor
    Posts
    3,494
    Thanks
    5
    Thanked 105 Times in 104 Posts
    Blog Entries
    1

    Default

    What if you add a space between the </a> and the </li>? Either a real space or the HTML code equivalent?
    {CWoT - Riddle } {Freelance Copywriter} {Learn to Write}
    Follow Me on Twitter: @InkingHubris
    PHP Code:
    $result mysql_query("SELECT finger FROM hand WHERE id=3");
    echo 
    $result

  3. #3
    Join Date
    Jun 2005
    Location
    英国
    Posts
    11,876
    Thanks
    1
    Thanked 180 Times in 172 Posts
    Blog Entries
    2

    Default

    Never parse HTML with regex. It's too complex for that. You've a full-featured parser at your fingertips; use it.
    Code:
    Array.prototype.map = function(f) {
      var r = [];
      for(var i = 0; i < this.length; ++i)
        r[i] = f(this[i]);
      return r;
    };
    
    var content = Array.prototype.map.call(
      document.links,
      function(a) {
        return a.href;
      }
    ).join("\n<br>");
    Twey | I understand English | 日本語が分かります | mi jimpe fi le jbobau | mi esperanton komprenas | je comprends français | entiendo español | tôi ít hiểu tiếng Việt | ich verstehe ein bisschen Deutsch | beware XHTML | common coding mistakes | tutorials | various stuff | argh PHP!

  4. #4
    Join Date
    Jul 2006
    Location
    Canada
    Posts
    2,581
    Thanks
    13
    Thanked 28 Times in 28 Posts

    Default

    Ah... I forgot about document.links. Thanks Twey, silly me
    - Mike

  5. #5
    Join Date
    Jun 2005
    Location
    英国
    Posts
    11,876
    Thanks
    1
    Thanked 180 Times in 172 Posts
    Blog Entries
    2

    Default

    document.getElementsByTagName("a") was available too
    Twey | I understand English | 日本語が分かります | mi jimpe fi le jbobau | mi esperanton komprenas | je comprends français | entiendo español | tôi ít hiểu tiếng Việt | ich verstehe ein bisschen Deutsch | beware XHTML | common coding mistakes | tutorials | various stuff | argh PHP!

  6. #6
    Join Date
    Jul 2006
    Location
    Canada
    Posts
    2,581
    Thanks
    13
    Thanked 28 Times in 28 Posts

    Default

    Oh yeah... I forgot again. I was hoping to implement it in PHP after too so I could generate a list of the links on a particular page.
    - Mike

  7. #7
    Join Date
    Jun 2005
    Location
    英国
    Posts
    11,876
    Thanks
    1
    Thanked 180 Times in 172 Posts
    Blog Entries
    2

    Default

    For PHP, use XML_HTMLSax.
    Twey | I understand English | 日本語が分かります | mi jimpe fi le jbobau | mi esperanton komprenas | je comprends français | entiendo español | tôi ít hiểu tiếng Việt | ich verstehe ein bisschen Deutsch | beware XHTML | common coding mistakes | tutorials | various stuff | argh PHP!

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •