Log in

View Full Version : Links RegExp



mburt
02-20-2007, 10:26 PM
I'm trying to make a regular expression which will filter links on a page. I've tried to figure this out, but can't:

var content = "";
onload=function() {
var reg = /(<a ).*(<\/a>)/gi
var str = document.body.innerHTML;
if (reg.test(str)) {
var match = str.match(reg);
for (i=0;i<match.length;i++) {
content+="\n<br>"+match[i];
}
}
}
Say if I had:

<div id="links"></div>
<br><input type="button" value="grablinks" onclick="links.innerHTML=content;alert(content)">
<div>
<br><a href="#">LInk 1</a>
<a href="#" onclick="alert(content)">my link</a>
</div>
<ul>
<li><a href="http://www.google.ca">google</a></li>
<li><a href="#">LInk 4</a></li>
<li><a href="#">LInk 3</a></li>
</ul>
<a href="#">m ylin ek</a>
On my page, sometimes it would pick up the </li> as well.

BLiZZaRD
02-20-2007, 11:31 PM
What if you add a space between the </a> and the </li>? Either a real space or the HTML code equivalent?

Twey
02-20-2007, 11:39 PM
Never parse HTML with regex. It's too complex for that. You've a full-featured parser at your fingertips; use it.
Array.prototype.map = function(f) {
var r = [];
for(var i = 0; i < this.length; ++i)
r[i] = f(this[i]);
return r;
};

var content = Array.prototype.map.call(
document.links,
function(a) {
return a.href;
}
).join("\n<br>");

mburt
02-20-2007, 11:44 PM
Ah... I forgot about document.links. Thanks Twey, silly me :p

Twey
02-20-2007, 11:57 PM
document.getElementsByTagName("a") was available too :)

mburt
02-20-2007, 11:59 PM
Oh yeah... I forgot again. I was hoping to implement it in PHP after too so I could generate a list of the links on a particular page.

Twey
02-21-2007, 12:16 AM
For PHP, use XML_HTMLSax (http://pear.php.net/package/XML_HTMLSax).