PDA

View Full Version : scan a page for a piece of text



lankinator
06-03-2007, 03:43 PM
i tried using the get by id and inner html tags to do this but they wouldnt work :(

how would i make a script that scans the page your own for a word, then shows a div with the amount of times that word is on that page? :confused: :confused:

thanks for any help :(

Trinithis
06-03-2007, 05:19 PM
This is probably not the best way of counting words, but it is a solution.


function wordCount(word) {
var HTML = document.getElementsByTagName("body")[0].innerHTML
var text = HTML.replace(/<(style|script).*?>(.|\r?\n)*?<\/\1>/gi, "").replace(/<.*?>/gi, "");
var count = 0;
var regex = new RegExp("(\\r?\\n| |\\.)" + word + "(\\r?\\n| |\\.)", "i");
//The regex above needs more checking, such as for quotes, question marks, etc.
while(regex.test(text)) {
count++;
text = text.replace(regex, "$1$2");
}
return count;
}

Then you could make some other code to make the div that displays the word and the returned count.

Twey
06-03-2007, 05:29 PM
With innerHTML, it's easy:
function wordCount(word) {
return document.innerHTML.toString().match(new RegExp(word, "gi")).length;
}It might be more difficult to do it properly with DOM methods, though.

Trinithis
06-03-2007, 07:26 PM
Twey, your code would return extra word counts when a word is within another word, such as "ham" in "hammer". Also, it would look within tags that don't display text within them, such as SCRIPT or STYLE. Those are the only ones I can think of at the moment, but more could easily be added.

My idea to compensate for getting the exact word is to set delimiters at both ends of the word, and now that I think about it, the best way of doing that is to do /([^a-zA-Z]|^)word([^a-zA-Z]|$)/gi

Perhaps:


function wordCount(word) {
var HTML = document.getElementsByTagName("body")[0].innerHTML.toString();
var text = HTML.replace(/<(style|script).*?>(.|\r?\n)*?<\/\1>/gi, "").replace(/<.*?>/gi, "");
var regex = new RegExp("([^a-zA-Z]|^)" + word + "([^a-zA-Z]|$)", "gi");
return text.match(regex).length;
}

Twey
06-03-2007, 07:36 PM
Hmm... probably better to do it with DOM methods. We're venturing into the realms of parsing HTML with regex here, which is never a good idea (e.g. there could validly be > or < characters within event handlers).
if(typeof Node === "undefined")
var Node = {
'TEXT_NODE' : 3
};

function wordCount(word, caseSens, el) {
el = el || document.body;

var total = 0;

for(var i = 0, e = el.childNodes; i < e.length; ++i)
if(e[i].nodeType === Node.TEXT_NODE)
total += e[i].nodeValue.match(new RegExp('\\b' + word + '\\b', "g" + (caseSens ? "" : "i"))).length;
else
total += wordCount(word, caseSens, e[i]);

return total;
}