Log in

View Full Version : PHP Crawler Script?



penguins87
01-29-2010, 03:34 AM
I am trying to make a crawler for my website with PHP.

I got this code from a tutorial. Can you tell me how to use this function and loop to allow it to follow the links in my website?



<?php

function crawl($url) {

$html = file_get_contents($url);

preg_match("/<title>(.+)<\/title>/siU", $html, $matches);
$title = $matches[1];

$k = "<meta\s+name=['\"]??keywords['\"]??\s+content=['\"]??(.+)['\"]??\s*\/?>";
preg_match("/$k/siU", $html, $matches);
$keywords = $matches[1];

$d = "<meta\s+name=['\"]??description['\"]??\s+content=['\"]??(.+)['\"]??\s*\/?>";
preg_match("/$d/siU", $html, $matches);
$desc = $matches[1];

$rp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";
preg_match("/$rp/siU", $html, $matches);
$links = $matches[2];

$info = array("url" => $url, "title" => $title, "keywords" => $keywords, "description" => $desc, "links" => array($links));
return($info);

};

?>


Thanks.

djr33
01-29-2010, 03:38 AM
That function takes $url and processes it. You would need something else to run it.
The basic approach to that would be to create a loop through the text of your page (or any page), and for each link (find "<a" tags) get the href value and process it. You could make this recursive as well, if you'd like.
However, if this is just for fun that's fine, but this won't really get you anywhere in the end because it is just getting the meta info. Real crawlers (like google) search the actual content of pages now and meta tags like that are becoming less and less common.