Advanced Search

Results 1 to 5 of 5

Thread: scan a page for a piece of text

  1. #1
    Join Date
    Dec 2006
    Posts
    74
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default scan a page for a piece of text

    i tried using the get by id and inner html tags to do this but they wouldnt work

    how would i make a script that scans the page your own for a word, then shows a div with the amount of times that word is on that page?

    thanks for any help

  2. #2
    Join Date
    May 2007
    Location
    USA
    Posts
    373
    Thanks
    2
    Thanked 4 Times in 4 Posts

    Default

    This is probably not the best way of counting words, but it is a solution.
    Code:
    function wordCount(word) {
    	var HTML = document.getElementsByTagName("body")[0].innerHTML
    	var text = HTML.replace(/<(style|script).*?>(.|\r?\n)*?<\/\1>/gi, "").replace(/<.*?>/gi, "");
    	var count = 0;
    	var regex = new RegExp("(\\r?\\n| |\\.)" + word + "(\\r?\\n| |\\.)", "i");
    	//The regex above needs more checking, such as for quotes, question marks, etc.
    	while(regex.test(text)) {
    		count++;
    		text = text.replace(regex, "$1$2");
    		}
    	return count;
    	}
    Then you could make some other code to make the div that displays the word and the returned count.

  3. #3
    Join Date
    Jun 2005
    Location
    英国
    Posts
    11,913
    Thanks
    1
    Thanked 180 Times in 172 Posts
    Blog Entries
    2

    Default

    With innerHTML, it's easy:
    Code:
    function wordCount(word) {
      return document.innerHTML.toString().match(new RegExp(word, "gi")).length;
    }
    It might be more difficult to do it properly with DOM methods, though.
    Twey | I understand English | 日本語が分かります | mi jimpe fi le jbobau | mi esperanton komprenas | je comprends français | entiendo español | tôi ít hiểu tiếng Việt | ich verstehe ein bisschen Deutsch | beware XHTML | common coding mistakes | tutorials | various stuff | argh PHP!

  4. #4
    Join Date
    May 2007
    Location
    USA
    Posts
    373
    Thanks
    2
    Thanked 4 Times in 4 Posts

    Default

    Twey, your code would return extra word counts when a word is within another word, such as "ham" in "hammer". Also, it would look within tags that don't display text within them, such as SCRIPT or STYLE. Those are the only ones I can think of at the moment, but more could easily be added.

    My idea to compensate for getting the exact word is to set delimiters at both ends of the word, and now that I think about it, the best way of doing that is to do /([^a-zA-Z]|^)word([^a-zA-Z]|$)/gi

    Perhaps:
    Code:
    function wordCount(word) {
    	var HTML = document.getElementsByTagName("body")[0].innerHTML.toString();
    	var text = HTML.replace(/<(style|script).*?>(.|\r?\n)*?<\/\1>/gi, "").replace(/<.*?>/gi, "");
    	var regex = new RegExp("([^a-zA-Z]|^)" + word + "([^a-zA-Z]|$)", "gi");
    	return text.match(regex).length;
    	}
    Last edited by Trinithis; 06-03-2007 at 07:35 PM. Reason: To add the option that the word starts or ends the string

  5. #5
    Join Date
    Jun 2005
    Location
    英国
    Posts
    11,913
    Thanks
    1
    Thanked 180 Times in 172 Posts
    Blog Entries
    2

    Default

    Hmm... probably better to do it with DOM methods. We're venturing into the realms of parsing HTML with regex here, which is never a good idea (e.g. there could validly be > or < characters within event handlers).
    Code:
    if(typeof Node === "undefined")
      var Node = {
        'TEXT_NODE' : 3
      };
    
    function wordCount(word, caseSens, el) {
      el = el || document.body;
    
      var total = 0;
    
      for(var i = 0, e = el.childNodes; i < e.length; ++i)
        if(e[i].nodeType === Node.TEXT_NODE)
          total += e[i].nodeValue.match(new RegExp('\\b' + word + '\\b', "g" + (caseSens ? "" : "i"))).length;
        else
          total += wordCount(word, caseSens, e[i]);
    
      return total;
    }
    Last edited by Twey; 06-03-2007 at 07:47 PM.
    Twey | I understand English | 日本語が分かります | mi jimpe fi le jbobau | mi esperanton komprenas | je comprends français | entiendo español | tôi ít hiểu tiếng Việt | ich verstehe ein bisschen Deutsch | beware XHTML | common coding mistakes | tutorials | various stuff | argh PHP!

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •