Results 1 to 2 of 2

Thread: PHP Crawler Script?

  1. #1
    Join Date
    Jul 2009
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default PHP Crawler Script?

    I am trying to make a crawler for my website with PHP.

    I got this code from a tutorial. Can you tell me how to use this function and loop to allow it to follow the links in my website?

    Code:
    <?php
    
    function crawl($url) {
    
    $html = file_get_contents($url);
    
    preg_match("/<title>(.+)<\/title>/siU", $html, $matches);
        $title = $matches[1];
    
    $k = "<meta\s+name=['\"]??keywords['\"]??\s+content=['\"]??(.+)['\"]??\s*\/?>";
        preg_match("/$k/siU", $html, $matches);
        $keywords = $matches[1];
    
    $d = "<meta\s+name=['\"]??description['\"]??\s+content=['\"]??(.+)['\"]??\s*\/?>";
        preg_match("/$d/siU", $html, $matches);
        $desc = $matches[1];
    
    $rp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";
        preg_match("/$rp/siU", $html, $matches);
        $links = $matches[2];
    
    $info = array("url" => $url, "title" => $title, "keywords" => $keywords, "description" => $desc, "links" => array($links));
    return($info);
    
    };
    
    ?>
    Thanks.

  2. #2
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    That function takes $url and processes it. You would need something else to run it.
    The basic approach to that would be to create a loop through the text of your page (or any page), and for each link (find "<a" tags) get the href value and process it. You could make this recursive as well, if you'd like.
    However, if this is just for fun that's fine, but this won't really get you anywhere in the end because it is just getting the meta info. Real crawlers (like google) search the actual content of pages now and meta tags like that are becoming less and less common.
    Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •