PDA

View Full Version : Creating an Individual Word Counter



Aragoth
02-08-2010, 04:21 AM
Does anyone know how to begin to develop a word counter that can access an input file, read the individual words within the file, and print them along with the number of times they appeared within the document? As in:

Input:


Word sense disambiguation is the process of selecting
the most appropriate meaning for a word, based on
the context in which it occurs. For our purposes it is
assumed that the set of possible meanings, i.e., the
sense inventory, has already been determined. For
example, suppose bill has the following set of possible
meanings: a piece of currency, pending legislation,
or a bird jaw. When used in the context of The
Senate bill is under consideration, a human reader
immediately understands that bill is being used in
the legislative sense. However, a computer program
attempting to perform the same task faces a difficult
problem since it does not have the benefit of innate
common-sense or linguistic knowledge.

Output:

Word: 1
sense: 4
disambiguation: 1
is: 4
the: 10
process: 1
of: 6
selecting: 1
most: 1
appropriate: 1
meaning: 1
for: 1
a: 6
word: 1
based: 1
on: 1
context: 2
in: 4
which: 1
it: 3
occurs: 1
For: 2
our: 1
purposes: 1
assumed: 1
that: 2
set: 2
possible: 2
meanings: 2
other: 1
words: 1
inventory: 1
has: 2
already: 1
been: 1
determined: 1
example: 1
suppose: 1
bill: 3
following: 1
piece: 1
currency: 1
pending: 1
legislation: 1
or: 2
bird: 1
jaw: 1
When: 1
used: 2
The: 1
Senate: 1
under: 1
consideration: 1
human: 1
reader: 1
immediately: 1
understands: 1
being: 1
legislative: 1
However: 1
computer: 1
program: 1
attempting: 1
to: 1
perform: 1
same: 1
task: 1
faces: 1
difficult: 1
problem: 1
since: 1
does: 1
not: 1
have: 1
benefit: 1
innate: 1
common: 1
linguistic: 1
knowledge: 1

To do this, I was thinking of using the Scanner class to get the data, but I'm not sure how to split the paragraph into words, and to associate a counter with the words after.

Any help would be greatly appreciated!

TheAlfreds
02-16-2010, 01:01 AM
You could try creating a String array and then look through the input(possibly in a JTextField or JTextArea) and have it not put symbols or spaces in the array.

Then you could search through the array and look for any similar strings and have it count those and print out the count.
This would probably print out the words more times then needed in the output. To correct this I would throw each collected word into a new array and have it ignore any of the same words when it comes across them.

Another way would be to sort the words into alphabetical order and then the searching would be much easier.

Just some ideas of how I would go about this...still learning too.

Aragoth
02-22-2010, 06:50 AM
That seems like a legitimate start. Thanks for the idea.

djr33
02-22-2010, 09:14 AM
Remember that you must first strip all variations: make everything lowercase, split at word boundaries and remove punctuation.

The ideas above will work fine.

I'd probably approach it like this (though I use PHP, not Java, but it should adjust itself just fine):
1. split the words into an array.
2. loop through each element of that array and:
3. save it in this format $array[$WORD] = $array[$WORD]+1;

Then you will have a "list" by array keys of the words corresponding to the number of times they appear.

icywindow
03-05-2010, 07:53 PM
In Java, the stringtokenizer or string.split() commands (only because stringtokenizer is deprecated) basically would allow a lot of that functionality.

If I had to do this, I'd run each of the strings through a SortedSet<String> to get all the unique words, then make an array of ArrayLists, one with <Integer>s, and one with <String>s, copy the SortedSet<String> to the ArrayList<String>, and then iterate through the string list again to increment my Integer lookup counts.

It's a thought anyways. Hope that this helps.

Mak_hich
12-09-2010, 05:16 PM
Ive been scouring the internet for a word frequency counter similar to Hermetic Word Counter. Unfortunately, thats only for Windows. Anyone know of something like it for the Mac?

heavensgate15
12-14-2010, 04:43 PM
I have this code in C, but it's still not complete because I didn't consider the punctuation marks. Hope this will help:



#include<stdio.h>


typedef char Word[50];
typedef struct temp{
Word w;
int cnt;
struct temp *next;
}WordCount;

typedef WordCount *WordCPointer;

void printWordCount(WordCPointer wcp);

void main(void)
{
FILE *fp;
WordCPointer temp, wcp = NULL;
WordCPointer *wcpP;
Word str;

clrscr();

if((fp = fopen("data.txt","r")) != NULL)
{
do{
wcpP = &wcp;
fscanf(fp,"%s",str);
while(*wcpP != NULL && (stricmp((*wcpP)->w,str)) < 0)
wcpP = &(*wcpP)->next;
if(*wcpP == NULL || (stricmp((*wcpP)->w,str)) > 0)
{
temp = (WordCPointer) malloc (sizeof(WordCount));
strcpy(temp->w,str);
temp->cnt = 1;
temp->next = *wcpP;
*wcpP = temp;
}
else
(*wcpP)->cnt++;
}while(!feof(fp));

fclose(fp);
}
printWordCount(wcp);
getch();
}

void printWordCount(WordCPointer wcp)
{
WordCPointer p;

for(p = wcp; p != NULL; p = p->next)
printf("%s: %d\n",p->w,p->cnt);
}

If you have questions, feel free to do so... =)

johndavid312
08-23-2012, 08:25 AM
You can have the program that will input the file and read the whole content. You should then save them in string array. All the String related methods will help you.
You will need many temp variables. Make a program that will return you the total number of words along with each word separated.
Now start loop and assign the first word into temp variable along with a counter variable with it. Go along with the loop and check if the next word matches with any previous word; if it does, increase the counter variable else assign it to another temp variable and do the same next time. Hope this will help you. Thanks