View Full Version : PHP/Text File - Most Commonly Occurring Words
DigiplayStudios
03-01-2010, 08:14 PM
Hi There,
I am running a small search engine and every time someone enters a search the term is placed on a link in a text file. I know I should probably be using a MySQL database table but for simplicity's sake I am using a text file.
How can I grab the 40 most popular words that appear in the text file and print them onto a PHP page?
james438
03-01-2010, 10:32 PM
<pre>
<?php
$text = file_get_contents("http://www.mysite.com/test.txt",NULL);
$text=str_replace("\r\n"," ",$text);
$text=preg_replace('/[^a-zA-Z\s]/','',$text);
$text=preg_replace('/(\s){2,}/',' ',$text);
$text=strtolower($text);
$text=explode(" ",$text);
$out=array_count_values($text);
arsort($out);
$out=array_slice($out,0,40);
print_r($out);
?>
</pre>
let me know if you need any of this explained.
I don't use mysql to get list the most common words, because while it works it does not work the way I want and to do so I would need to upgrade my account and for the cost it is just not worth it to me.
DigiplayStudios
03-02-2010, 06:50 PM
<pre>
<?php
$text = file_get_contents("http://www.mysite.com/test.txt",NULL);
$text=str_replace("\r\n"," ",$text);
$text=preg_replace('/[^a-zA-Z\s]/','',$text);
$text=preg_replace('/(\s){2,}/',' ',$text);
$text=strtolower($text);
$text=explode(" ",$text);
$out=array_count_values($text);
arsort($out);
$out=array_slice($out,0,40);
print_r($out);
?>
</pre>
let me know if you need any of this explained.
I don't use mysql to get list the most common words, because while it works it does not work the way I want and to do so I would need to upgrade my account and for the cost it is just not worth it to me.
Thanks so much, it's really helpful but would you be able to help me convert the following code into something similar? The code below is what I am currently using, it prints the most recent words copied to the text file, instead of the top 40... I'm quite lost! =S
<?php
$f1le = 'textfile.txt';
if(@filesize($f1le) > 1) {
$fp0 = fopen($f1le, "r");
$howmuch = 55;
$pos = -2;
$hitung = 0; ?>
<p><font color="orange" style="text-transform:lowercase;">
<?php
//$crotz = "<ul>";
do {
fseek($fp0, $pos, SEEK_END);
$pos--;
//$crotz .= fgetc($fp0);
if(fgetc($fp0) == "\n") {
//$crotz .= "<li>";
$crotz .= '<a class="latest" href="';
$t3mp = fgets($fp0);
$t3mp = str_replace('[keywords]:','',$t3mp);
$t3mp = str_replace("\n","",$t3mp);
$crotz .= "http://".www.MySite.com."/index.php?search=".urlencode($t3mp)."&source=All";
$crotz .= "\">".htmlspecialchars($t3mp)."</a>";
if($hitung < ($howmuch-1))
$crotz .= ", ";
//$crotz .- '</li>';
$hitung++;
}
} while($hitung < $howmuch && !feof($fp0));
fclose($fp0);
//$crotz .= "</ul>";
echo $crotz;
}
?>
</font>
</p>
james438
03-02-2010, 11:25 PM
This first script will take all of the words in your file and hyperlink them. I am guessing this is what you do not want, so let's skip to the next script.
<?php
$text = file_get_contents("http://www.mysite.com/the-includes/search.txt",NULL);
$text=str_replace('[keywords]:','',$text);
$text=str_replace("\r\n"," ",$text);
$text=str_replace("\n",' ',$text);
$text=preg_replace('/[^a-zA-Z\s]/','',$text);
$text=preg_replace('/(\s){2,}/',' ',$text);
$text=strtolower($text);
$text=explode(" ",$text);
$out=array_count_values($text);
arsort($out);
$out=array_slice($out,0,40);
$crotz="";
$out=array_keys($out);
foreach ($out as $value) {
$crotz .= '<a class="latest" href="';
$crotz .= "http://".www.mysite.com."/index.php?search=".urlencode($value)."&source=All";
$crotz .= "\">".htmlspecialchars($value)."</a>";
$crotz .= ", ";
}
echo"$crotz";
?>
The following is similar, but will take each set of terms like "black eyed peas" and hyperlink the grouped words.
<?php
$text = file_get_contents("http://www.mysite.com/the-includes/search.txt",NULL);
$text=str_replace('[keywords]:','<br>',$text);
$text=strtolower($text);
$text=explode("<br>",$text);
$out=array_count_values($text);
arsort($out);
$out=array_slice($out,0,40);
$crotz="";
$out=array_keys($out);
foreach ($out as $value) {
$crotz .= '<a class="latest" href="';
$crotz .= "http://".www.mysite.com."/index.php?search=".urlencode($value)."&source=All";
$crotz .= "\">".htmlspecialchars($value)."</a>";
$crotz .= ", ";
}
echo"$crotz";
?>
This second script uses [keywords]: as the delimiter. In retrospect I can see one or two ways the second script could be improved a bit more, but it should work well enough as is.
DigiplayStudios
03-03-2010, 06:40 AM
This is perfect, one last thing... the code leaves a space between each side of the ',' separator. How can I change it so there is only a space on the right side of each word group separator. You are a legend, thank you so much for all your help mate. :)
james438
03-03-2010, 10:27 AM
This update pertains to the second script, which is the one I assume you are talking about. It fixes the spacing bug, trims the whitespace from the terms, removes strange characters from the url (although you may want that, for example a "?" in the url, such as "where am I?" would look like "where+am+I+%3F". If you do want this then remove '%3F' from the array in the fourth line from the end.), and it cleans up the code here and there.
<?php
$text = file_get_contents("http://www.mysite.com/the-includes/search.txt",NULL);
$text=strtolower($text);
$text=explode('[keywords]:',$text);
foreach ($text as &$value){$value=rtrim($value);}
$out=array_count_values($text);
arsort($out);
$out=array_slice($out,0,40);
$crotz="";
$out=array_keys($out);
foreach ($out as $value) {
$crotz .= '<a class="latest" href="';
$crotz .= "http://".www.mysite.com."/index.php?search=".urlencode($value)."&source=All";
$crotz .= "\">".htmlspecialchars($value)."</a>";
$crotz .= ", ";
}
$crotz=str_replace(array('%0A','%3F',"\n"),'',$crotz);
$crotz=substr($crotz,0,-2);
echo"$crotz";
?>
It should look and work a bit better now.
DigiplayStudios
03-03-2010, 06:48 PM
This is all perfect... I will remove this post in a day or so, for security reasons.. but my thanks are so sincere and I really appreciate everything! ;)
james438
03-03-2010, 07:23 PM
My pleasure.
If you want you can just edit your post to remove the private data so that other people who are looking for a solution to a similar problem can still find it. I can do the same in my posts.
Powered by vBulletin® Version 4.2.2 Copyright © 2021 vBulletin Solutions, Inc. All rights reserved.