Results 1 to 8 of 8

Thread: PHP/Text File - Most Commonly Occurring Words

  1. #1
    Join Date
    Mar 2010
    Posts
    14
    Thanks
    4
    Thanked 0 Times in 0 Posts

    Default PHP/Text File - Most Commonly Occurring Words

    Hi There,

    I am running a small search engine and every time someone enters a search the term is placed on a link in a text file. I know I should probably be using a MySQL database table but for simplicity's sake I am using a text file.

    How can I grab the 40 most popular words that appear in the text file and print them onto a PHP page?

  2. #2
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    1,720
    Thanks
    82
    Thanked 90 Times in 88 Posts

    Default

    PHP Code:
    <pre>
    <?php
    $text 
    file_get_contents("http://www.mysite.com/test.txt",NULL);
    $text=str_replace("\r\n"," ",$text);
    $text=preg_replace('/[^a-zA-Z\s]/','',$text);
    $text=preg_replace('/(\s){2,}/',' ',$text);
    $text=strtolower($text);
    $text=explode(" ",$text);
    $out=array_count_values($text);
    arsort($out);
    $out=array_slice($out,0,40);
    print_r($out);
    ?>
    </pre>
    let me know if you need any of this explained.

    I don't use mysql to get list the most common words, because while it works it does not work the way I want and to do so I would need to upgrade my account and for the cost it is just not worth it to me.
    Last edited by james438; 03-02-2010 at 02:29 AM. Reason: removed one useless line of code and edited code slightly.
    To choose the lesser of two evils is still to choose evil. My personal site

  3. The Following User Says Thank You to james438 For This Useful Post:

    DigiplayStudios (03-02-2010)

  4. #3
    Join Date
    Mar 2010
    Posts
    14
    Thanks
    4
    Thanked 0 Times in 0 Posts

    Default

    Quote Originally Posted by james438 View Post
    PHP Code:
    <pre>
    <?php
    $text 
    file_get_contents("http://www.mysite.com/test.txt",NULL);
    $text=str_replace("\r\n"," ",$text);
    $text=preg_replace('/[^a-zA-Z\s]/','',$text);
    $text=preg_replace('/(\s){2,}/',' ',$text);
    $text=strtolower($text);
    $text=explode(" ",$text);
    $out=array_count_values($text);
    arsort($out);
    $out=array_slice($out,0,40);
    print_r($out);
    ?>
    </pre>
    let me know if you need any of this explained.

    I don't use mysql to get list the most common words, because while it works it does not work the way I want and to do so I would need to upgrade my account and for the cost it is just not worth it to me.

    Thanks so much, it's really helpful but would you be able to help me convert the following code into something similar? The code below is what I am currently using, it prints the most recent words copied to the text file, instead of the top 40... I'm quite lost! =S

    PHP Code:
    <?php

    $f1le 
    'textfile.txt';
    if(@
    filesize($f1le) > 1) {
    $fp0 fopen($f1le"r");

    $howmuch 55;
    $pos = -2;
    $hitung 0?>
    <p><font color="orange" style="text-transform:lowercase;">
    <?php
    //$crotz = "<ul>";
    do {
      
    fseek($fp0$posSEEK_END);
      
    $pos--;
      
    //$crotz .= fgetc($fp0);
      
    if(fgetc($fp0) == "\n") {
        
    //$crotz .= "<li>";
        
    $crotz .= '<a class="latest" href="';
        
    $t3mp fgets($fp0);
        
    $t3mp str_replace('[keywords]:','',$t3mp);
        
    $t3mp str_replace("\n","",$t3mp);
        
    $crotz .= "http://".www.MySite.com."/index.php?search=".urlencode($t3mp)."&source=All";
        
    $crotz .= "\">".htmlspecialchars($t3mp)."</a>";
        if(
    $hitung < ($howmuch-1))
        
    $crotz .= ", ";
        
    //$crotz .- '</li>';
        
    $hitung++;
      }
    } while(
    $hitung $howmuch && !feof($fp0));
    fclose($fp0);
    //$crotz .= "</ul>";
    echo $crotz;
    }
    ?>
    </font>
    </p>
    Last edited by DigiplayStudios; 03-04-2010 at 08:54 PM.

  5. #4
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    1,720
    Thanks
    82
    Thanked 90 Times in 88 Posts

    Default

    This first script will take all of the words in your file and hyperlink them. I am guessing this is what you do not want, so let's skip to the next script.

    PHP Code:
    <?php
    $text 
    file_get_contents("http://www.mysite.com/the-includes/search.txt",NULL);
    $text=str_replace('[keywords]:','',$text);

    $text=str_replace("\r\n"," ",$text);
    $text=str_replace("\n",' ',$text);
    $text=preg_replace('/[^a-zA-Z\s]/','',$text);
    $text=preg_replace('/(\s){2,}/',' ',$text);
    $text=strtolower($text);
    $text=explode(" ",$text);
    $out=array_count_values($text);
    arsort($out);
    $out=array_slice($out,0,40);
    $crotz="";
    $out=array_keys($out);
    foreach (
    $out as $value) {

        
    $crotz .= '<a class="latest" href="';
        
    $crotz .= "http://".www.mysite.com."/index.php?search=".urlencode($value)."&source=All";
        
    $crotz .= "\">".htmlspecialchars($value)."</a>";
        
    $crotz .= ", ";
    }
    echo
    "$crotz";
    ?>

    The following is similar, but will take each set of terms like "black eyed peas" and hyperlink the grouped words.
    PHP Code:
    <?php
    $text 
    file_get_contents("http://www.mysite.com/the-includes/search.txt",NULL);
    $text=str_replace('[keywords]:','<br>',$text);

    $text=strtolower($text);
    $text=explode("<br>",$text);
    $out=array_count_values($text);
    arsort($out);
    $out=array_slice($out,0,40);
    $crotz="";
    $out=array_keys($out);
    foreach (
    $out as $value) {

        
    $crotz .= '<a class="latest" href="';
        
    $crotz .= "http://".www.mysite.com."/index.php?search=".urlencode($value)."&source=All";
        
    $crotz .= "\">".htmlspecialchars($value)."</a>";
        
    $crotz .= ", ";
    }
    echo
    "$crotz";
    ?>
    This second script uses [keywords]: as the delimiter. In retrospect I can see one or two ways the second script could be improved a bit more, but it should work well enough as is.
    Last edited by james438; 03-03-2010 at 06:25 PM.
    To choose the lesser of two evils is still to choose evil. My personal site

  6. The Following User Says Thank You to james438 For This Useful Post:

    DigiplayStudios (03-03-2010)

  7. #5
    Join Date
    Mar 2010
    Posts
    14
    Thanks
    4
    Thanked 0 Times in 0 Posts

    Default

    This is perfect, one last thing... the code leaves a space between each side of the ',' separator. How can I change it so there is only a space on the right side of each word group separator. You are a legend, thank you so much for all your help mate.

  8. #6
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    1,720
    Thanks
    82
    Thanked 90 Times in 88 Posts

    Default

    This update pertains to the second script, which is the one I assume you are talking about. It fixes the spacing bug, trims the whitespace from the terms, removes strange characters from the url (although you may want that, for example a "?" in the url, such as "where am I?" would look like "where+am+I+%3F". If you do want this then remove '%3F' from the array in the fourth line from the end.), and it cleans up the code here and there.

    PHP Code:
    <?php
    $text 
    file_get_contents("http://www.mysite.com/the-includes/search.txt",NULL);
    $text=strtolower($text);
    $text=explode('[keywords]:',$text);
    foreach (
    $text as &$value){$value=rtrim($value);}
    $out=array_count_values($text);
    arsort($out);
    $out=array_slice($out,0,40);
    $crotz="";
    $out=array_keys($out);
    foreach (
    $out as $value) {
        
    $crotz .= '<a class="latest" href="';
        
    $crotz .= "http://".www.mysite.com."/index.php?search=".urlencode($value)."&source=All";
        
    $crotz .= "\">".htmlspecialchars($value)."</a>";
        
    $crotz .= ", ";
    }
    $crotz=str_replace(array('%0A','%3F',"\n"),'',$crotz);
    $crotz=substr($crotz,0,-2);
    echo
    "$crotz";
    ?>
    It should look and work a bit better now.
    Last edited by james438; 03-03-2010 at 06:24 PM. Reason: grammar
    To choose the lesser of two evils is still to choose evil. My personal site

  9. The Following User Says Thank You to james438 For This Useful Post:

    DigiplayStudios (03-03-2010)

  10. #7
    Join Date
    Mar 2010
    Posts
    14
    Thanks
    4
    Thanked 0 Times in 0 Posts

    Default

    This is all perfect... I will remove this post in a day or so, for security reasons.. but my thanks are so sincere and I really appreciate everything!

  11. #8
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    1,720
    Thanks
    82
    Thanked 90 Times in 88 Posts

    Default

    My pleasure.

    If you want you can just edit your post to remove the private data so that other people who are looking for a solution to a similar problem can still find it. I can do the same in my posts.
    To choose the lesser of two evils is still to choose evil. My personal site

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •