Results 1 to 5 of 5

Thread: Php strip spaces and punctuation and common words etc

  1. #1
    Join Date
    Mar 2005
    Location
    Western Australia
    Posts
    148
    Thanks
    24
    Thanked 4 Times in 4 Posts

    Default Php strip spaces and punctuation and common words etc

    Hi guys

    Can someone help me out by telling me how to or pointing me to the right place for me to have php code that will look at a specified piece of content and;

    1. remove all common words like, it, and, the, that, is etc
    2. remove all spaces and replace them with commas
    3. remove any double commas with single ones (in case of existing punctuation and double spaces so that I am left with all words in the content separated by commas.
    4. sort the words separated by the commas from most common to least common
    5. remove all duplicates (i.e. leave one)
    6. truncate to X amount of words
    7. output result

    Cheers
    1st rule of web development - use Firefox and Firebug
    2nd rule - see the first rule
    --
    I like Smilies

  2. #2
    Join Date
    Jan 2008
    Posts
    4,168
    Thanks
    28
    Thanked 628 Times in 624 Posts
    Blog Entries
    1

    Default

    I don't know what you mean by sort the words by common to least common. But here's the code(except that part):
    PHP Code:
    <?php
        $common 
    = array('is','the','that','them','and','he');
        
    $find = array(' '',,');
        
    $replace = array(','',');
        
    $truncate 5;
        
    //no editting below
        
    $truncate++;
        if(isset(
    $_POST['submit'])){
            
    $content str_replace($common""$_POST['content']);
            
    $content str_replace($find$replace$content);
            
    $content explode(','$content);
            foreach(
    $content as $key => $value){
                
    $arr_search array_search($value$content);
                if(
    $arr_search != $key){
                    unset(
    $content[$arr_search]);
                }
            }
            
    $new_content = array();
            if(
    count($content) > $truncate){
                for(
    $i=0$i<$truncate$i++){
                    
    $new_content[] = $content[$i];
                }
            } else {
                
    $new_content $content;
            }
            echo 
    implode(','$new_content);
        }
    ?>
    <form action="<?php echo $_SERVER["PHP_SELF"]; ?>" method="post">
    <textarea name="content" style="font-family: arial; height: 250px; width: 500px;"></textarea>
    <br />
    <input type="submit" name="submit" />
    </form>
    Jeremy | jfein.net

  3. The Following User Says Thank You to Nile For This Useful Post:

    gwmbox (02-11-2009)

  4. #3
    Join Date
    Mar 2005
    Location
    Western Australia
    Posts
    148
    Thanks
    24
    Thanked 4 Times in 4 Posts

    Default

    Wow thanks for that, excellent work and very much appreciated. I am actually putting it all together for a free plugin I am putting together for ZenPhoto and this is the bit that was stumping me. Does it have to be actioned by a form submit, or could I create a php function to call the code into action if that function is called?

    Should also add the content is defined by another function getBareContent which strips all the html from the pages content and uses it as plain text

    The common words part is where if a word not in common is repeated the most in the content, so if the word 'house' was mentioned ten times, and another word say 'bedroom' was mentioned only 8 times then it would rank them from most to least - so order would be house,bedroom and so on.... If equal then order as they are collected from the content. This is not a required thing it is more a nice add on

    Cheers again for you excellent help
    Last edited by gwmbox; 02-11-2009 at 09:24 AM. Reason: bit more info requested :P
    1st rule of web development - use Firefox and Firebug
    2nd rule - see the first rule
    --
    I like Smilies

  5. #4
    Join Date
    Jan 2008
    Posts
    4,168
    Thanks
    28
    Thanked 628 Times in 624 Posts
    Blog Entries
    1

    Default

    Glad to help.
    I'll see what I can do with the order thing. If I can't get it I can use the alphabet.
    Jeremy | jfein.net

  6. #5
    Join Date
    Mar 2005
    Location
    Western Australia
    Posts
    148
    Thanks
    24
    Thanked 4 Times in 4 Posts

    Default

    For the common array to remove common words, how would I get it to only remove the whole words, so that the word 'The' is removed but the word 'Theme' is not truncated to 'me' and thus remains as Theme?

    Cheers
    1st rule of web development - use Firefox and Firebug
    2nd rule - see the first rule
    --
    I like Smilies

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •