Results 1 to 8 of 8

Thread: Creating an Individual Word Counter

  1. #1
    Join Date
    Dec 2005
    Location
    Canada
    Posts
    82
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default Creating an Individual Word Counter

    Does anyone know how to begin to develop a word counter that can access an input file, read the individual words within the file, and print them along with the number of times they appeared within the document? As in:

    Input:
    Word sense disambiguation is the process of selecting
    the most appropriate meaning for a word, based on
    the context in which it occurs. For our purposes it is
    assumed that the set of possible meanings, i.e., the
    sense inventory, has already been determined. For
    example, suppose bill has the following set of possible
    meanings: a piece of currency, pending legislation,
    or a bird jaw. When used in the context of The
    Senate bill is under consideration, a human reader
    immediately understands that bill is being used in
    the legislative sense. However, a computer program
    attempting to perform the same task faces a difficult
    problem since it does not have the benefit of innate
    common-sense or linguistic knowledge.
    Output:
    Word: 1
    sense: 4
    disambiguation: 1
    is: 4
    the: 10
    process: 1
    of: 6
    selecting: 1
    most: 1
    appropriate: 1
    meaning: 1
    for: 1
    a: 6
    word: 1
    based: 1
    on: 1
    context: 2
    in: 4
    which: 1
    it: 3
    occurs: 1
    For: 2
    our: 1
    purposes: 1
    assumed: 1
    that: 2
    set: 2
    possible: 2
    meanings: 2
    other: 1
    words: 1
    inventory: 1
    has: 2
    already: 1
    been: 1
    determined: 1
    example: 1
    suppose: 1
    bill: 3
    following: 1
    piece: 1
    currency: 1
    pending: 1
    legislation: 1
    or: 2
    bird: 1
    jaw: 1
    When: 1
    used: 2
    The: 1
    Senate: 1
    under: 1
    consideration: 1
    human: 1
    reader: 1
    immediately: 1
    understands: 1
    being: 1
    legislative: 1
    However: 1
    computer: 1
    program: 1
    attempting: 1
    to: 1
    perform: 1
    same: 1
    task: 1
    faces: 1
    difficult: 1
    problem: 1
    since: 1
    does: 1
    not: 1
    have: 1
    benefit: 1
    innate: 1
    common: 1
    linguistic: 1
    knowledge: 1
    To do this, I was thinking of using the Scanner class to get the data, but I'm not sure how to split the paragraph into words, and to associate a counter with the words after.

    Any help would be greatly appreciated!

  2. #2
    Join Date
    Feb 2010
    Posts
    3
    Thanks
    1
    Thanked 1 Time in 1 Post

    Default

    You could try creating a String array and then look through the input(possibly in a JTextField or JTextArea) and have it not put symbols or spaces in the array.

    Then you could search through the array and look for any similar strings and have it count those and print out the count.
    This would probably print out the words more times then needed in the output. To correct this I would throw each collected word into a new array and have it ignore any of the same words when it comes across them.

    Another way would be to sort the words into alphabetical order and then the searching would be much easier.

    Just some ideas of how I would go about this...still learning too.

  3. The Following User Says Thank You to TheAlfreds For This Useful Post:

    Aragoth (02-22-2010)

  4. #3
    Join Date
    Dec 2005
    Location
    Canada
    Posts
    82
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    That seems like a legitimate start. Thanks for the idea.

  5. #4
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    Remember that you must first strip all variations: make everything lowercase, split at word boundaries and remove punctuation.

    The ideas above will work fine.

    I'd probably approach it like this (though I use PHP, not Java, but it should adjust itself just fine):
    1. split the words into an array.
    2. loop through each element of that array and:
    3. save it in this format $array[$WORD] = $array[$WORD]+1;

    Then you will have a "list" by array keys of the words corresponding to the number of times they appear.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  6. #5
    Join Date
    Mar 2010
    Posts
    14
    Thanks
    1
    Thanked 5 Times in 5 Posts

    Default

    In Java, the stringtokenizer or string.split() commands (only because stringtokenizer is deprecated) basically would allow a lot of that functionality.

    If I had to do this, I'd run each of the strings through a SortedSet<String> to get all the unique words, then make an array of ArrayLists, one with <Integer>s, and one with <String>s, copy the SortedSet<String> to the ArrayList<String>, and then iterate through the string list again to increment my Integer lookup counts.

    It's a thought anyways. Hope that this helps.

  7. #6
    Join Date
    Dec 2010
    Location
    US
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Creating an Individual Word Counter

    Ive been scouring the internet for a word frequency counter similar to Hermetic Word Counter. Unfortunately, thats only for Windows. Anyone know of something like it for the Mac?

  8. #7
    Join Date
    May 2009
    Posts
    62
    Thanks
    19
    Thanked 3 Times in 3 Posts

    Default

    I have this code in C, but it's still not complete because I didn't consider the punctuation marks. Hope this will help:


    Code:
    #include<stdio.h>
    
    
    typedef char Word[50];
    typedef struct temp{
      Word w;
      int cnt;
      struct temp *next;
    }WordCount;
    
    typedef WordCount *WordCPointer;
    
    void printWordCount(WordCPointer wcp);
    
    void main(void)
    {
       FILE *fp;
       WordCPointer temp, wcp = NULL;
       WordCPointer *wcpP;
       Word str;
    
       clrscr();
    
       if((fp = fopen("data.txt","r")) != NULL)
       {
          do{
             wcpP = &wcp;
    	 fscanf(fp,"%s",str);
    	 while(*wcpP != NULL && (stricmp((*wcpP)->w,str)) < 0)
    	       wcpP = &(*wcpP)->next;
    	 if(*wcpP == NULL || (stricmp((*wcpP)->w,str)) > 0)
    	 {
    	       temp = (WordCPointer) malloc (sizeof(WordCount));
    	       strcpy(temp->w,str);
    	       temp->cnt = 1;
    	       temp->next = *wcpP;
    	       *wcpP = temp;
    	 }
             else
                 (*wcpP)->cnt++;
            }while(!feof(fp));
    
           fclose(fp);
       }
       printWordCount(wcp);
       getch();
    }
    
    void printWordCount(WordCPointer wcp)
    {
       WordCPointer p;
    
       for(p = wcp; p != NULL; p = p->next)
          printf("%s: %d\n",p->w,p->cnt);
    }
    If you have questions, feel free to do so... =)
    Last edited by heavensgate15; 12-14-2010 at 03:50 PM.

  9. #8
    Join Date
    Aug 2012
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    You can have the program that will input the file and read the whole content. You should then save them in string array. All the String related methods will help you.
    You will need many temp variables. Make a program that will return you the total number of words along with each word separated.
    Now start loop and assign the first word into temp variable along with a counter variable with it. Go along with the loop and check if the next word matches with any previous word; if it does, increase the counter variable else assign it to another temp variable and do the same next time. Hope this will help you. Thanks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •