Log in

View Full Version : Php strip spaces and punctuation and common words etc



gwmbox
02-10-2009, 11:06 PM
Hi guys

Can someone help me out by telling me how to or pointing me to the right place for me to have php code that will look at a specified piece of content and;

1. remove all common words like, it, and, the, that, is etc
2. remove all spaces and replace them with commas
3. remove any double commas with single ones (in case of existing punctuation and double spaces so that I am left with all words in the content separated by commas.
4. sort the words separated by the commas from most common to least common
5. remove all duplicates (i.e. leave one)
6. truncate to X amount of words
7. output result

Cheers

Nile
02-10-2009, 11:42 PM
I don't know what you mean by sort the words by common to least common. But here's the code(except that part):


<?php
$common = array('is','the','that','them','and','he');
$find = array(' ', ',,');
$replace = array(',', ',');
$truncate = 5;
//no editting below
$truncate++;
if(isset($_POST['submit'])){
$content = str_replace($common, "", $_POST['content']);
$content = str_replace($find, $replace, $content);
$content = explode(',', $content);
foreach($content as $key => $value){
$arr_search = array_search($value, $content);
if($arr_search != $key){
unset($content[$arr_search]);
}
}
$new_content = array();
if(count($content) > $truncate){
for($i=0; $i<$truncate; $i++){
$new_content[] = $content[$i];
}
} else {
$new_content = $content;
}
echo implode(',', $new_content);
}
?>
<form action="<?php echo $_SERVER["PHP_SELF"]; ?>" method="post">
<textarea name="content" style="font-family: arial; height: 250px; width: 500px;"></textarea>
<br />
<input type="submit" name="submit" />
</form>

gwmbox
02-11-2009, 09:20 AM
Wow thanks for that, excellent work and very much appreciated. I am actually putting it all together for a free plugin I am putting together for ZenPhoto and this is the bit that was stumping me. Does it have to be actioned by a form submit, or could I create a php function to call the code into action if that function is called?

Should also add the content is defined by another function getBareContent which strips all the html from the pages content and uses it as plain text

The common words part is where if a word not in common is repeated the most in the content, so if the word 'house' was mentioned ten times, and another word say 'bedroom' was mentioned only 8 times then it would rank them from most to least - so order would be house,bedroom and so on.... If equal then order as they are collected from the content. This is not a required thing it is more a nice add on :)

Cheers again for you excellent help

Nile
02-11-2009, 12:55 PM
Glad to help. :)
I'll see what I can do with the order thing. If I can't get it I can use the alphabet.

gwmbox
02-12-2009, 12:25 AM
For the common array to remove common words, how would I get it to only remove the whole words, so that the word 'The' is removed but the word 'Theme' is not truncated to 'me' and thus remains as Theme?

Cheers