Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: How do Google spiders handle php included content files?

  1. #1
    Join Date
    Jul 2008
    Location
    Derbyshire, UK
    Posts
    3,033
    Thanks
    25
    Thanked 599 Times in 575 Posts
    Blog Entries
    40

    Default How do Google spiders handle php included content files?

    I've been using php includes to seperate headers and footers from my main website pages for a while now, but lately I've decided to go one step further and seperate page content into their own text files too.

    This follows some hoo-ha I had with a novice administrator deleting the include references from the web pages and then wondering why he killed the website.

    What I'd like to know is how Google handles indexing of content thats been seperated from web pages and will it affect ranking, etc? Can Google-bots follow the include links to my content files and spider the content or do they hit a brickwall and stop dead in their tracks?

    Any advice is much appreciated.

  2. #2
    Join Date
    May 2007
    Location
    Boston,ma
    Posts
    2,127
    Thanks
    173
    Thanked 207 Times in 205 Posts

    Default

    The includes are included in the page. Google sees it as one page and doesn't know you have includes.

    Google bot Requests site from server
    Server process all php and outputs all html/javascript code
    Google bot goes through code never sees php code, it's only on the server.
    Last edited by bluewalrus; 04-08-2010 at 04:48 PM.
    Corrections to my coding/thoughts welcome.

  3. #3
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    PHP generates webpages. PHP code and PHP methods are invisible to viewers: the viewer sees a pure HTML page consisting of whatever PHP generates, such as includes, database information, whatever.
    To check this, look at the source code of your page (view>source), and that will be what google sees.
    This is completely unlike javascript which is processed on the client's computer, and usually ignored by google (etc).


    Note that the only thing you need to consider in this is how your URLs work. If you are using something like a form, google may not submit the form and find the new page. If you are using text only links, even with get variables (page.php?var=value), then google will read all of that.
    In theory, it's best for SEO to have simple links like (dir/dir/dir/) rather than filenames (dir/dir/file.php) and especially not with variables (dir/file.php?var=value), but this is NOT a big problem and can be hard to avoid (though it is possible with enough work).
    All of these methods will be indexed by google, though.

    Basically if it is in the URL it will be indexed by google. The only exception is the # symbol before anchors which are client side links for (usually) the page to jump to a certain section-- but that's the same page anyway, so it's archived.
    Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  4. #4
    Join Date
    Jul 2008
    Location
    Derbyshire, UK
    Posts
    3,033
    Thanks
    25
    Thanked 599 Times in 575 Posts
    Blog Entries
    40

    Default

    Thanks chaps.

    I understand how php serves the html header/footer portions to visitors, so they get "complete" pages, but I wasnt sure how Google saw the jigsawed-off parts of a website - its good to know that the bots see the outputted php-to-html source code too.

    Thanks for clearing that up for me.

  5. #5
    Join Date
    May 2007
    Location
    Boston,ma
    Posts
    2,127
    Thanks
    173
    Thanked 207 Times in 205 Posts

    Default

    To be more clear they dont see the php at all. They only see the html code.
    Corrections to my coding/thoughts welcome.

  6. #6
    Join Date
    Apr 2010
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    To be honest it is quite sure that the google crawlers only see the pages html. So let us imagine that I have the following webpage:

    http://www.someurl.com

    on this domain I have:
    index.php which includes header.php and footer.php

    how to I prevent the following appearances in google

    http://www.someurl.com/footer.php and http://www.someurl.com/footer.php

    because if those would appear in google as individual search results, it would look stupid for somebody who actually visits a page like http://www.someurl.com/footer.php => he would only see the footer as separate page.

    Does anybody have any experience in avoiding this jigsaw?

  7. #7
    Join Date
    Aug 2005
    Location
    Other Side of My Monitor
    Posts
    3,494
    Thanks
    5
    Thanked 105 Times in 104 Posts
    Blog Entries
    1

    Default

    Code:
    <meta name="ROBOTS" content="NOINDEX,NOFOLLOW" >
    make a <head> section in each of the separate pages, with that in there and Google and the other search engine bots wont index them nor follow any links they contain.

    Just be sure if you are including those on other pages, you don't include the meta tags as well
    {CWoT - Riddle } {Freelance Copywriter} {Learn to Write}
    Follow Me on Twitter: @InkingHubris
    PHP Code:
    $result mysql_query("SELECT finger FROM hand WHERE id=3");
    echo 
    $result

  8. #8
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    Google does not guess what pages exist. If you never link to footer.php it will not appear on google. You can also use page.inc to require that it cannot be loaded alone.
    Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  9. #9
    Join Date
    Apr 2010
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    blizzard / DJR,

    Thanks for sharing the useful information

    Allthough one more question what is more safe .inc (inc as final extension) or .inc.php ... in other words what if the inc as final extension is not recognized => would this echo my code as plain text?
    Last edited by dimitritetsch; 04-09-2010 at 08:50 PM.

  10. #10
    Join Date
    May 2007
    Location
    Boston,ma
    Posts
    2,127
    Thanks
    173
    Thanked 207 Times in 205 Posts

    Default

    Another solution i think would work would be

    PHP Code:
    <?php
    include ('page.php?include=yes');
    ?>
    on page.php
    PHP Code:
    <?php
    if ($_GET['include']== "yes") {
    insert code,
    load content here
    } else {
    echo 
    "Incorrect Page Address.";
    }
    ?>
    Corrections to my coding/thoughts welcome.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •