View Full Version : How do Google spiders handle php included content files?
Beverleyh
04-08-2010, 04:18 PM
I've been using php includes to seperate headers and footers from my main website pages for a while now, but lately I've decided to go one step further and seperate page content into their own text files too.
This follows some hoo-ha I had with a novice administrator deleting the include references from the web pages and then wondering why he killed the website.
What I'd like to know is how Google handles indexing of content thats been seperated from web pages and will it affect ranking, etc? Can Google-bots follow the include links to my content files and spider the content or do they hit a brickwall and stop dead in their tracks?
Any advice is much appreciated.
bluewalrus
04-08-2010, 04:40 PM
The includes are included in the page. Google sees it as one page and doesn't know you have includes.
Google bot Requests site from server
Server process all php and outputs all html/javascript code
Google bot goes through code never sees php code, it's only on the server.
djr33
04-08-2010, 07:08 PM
PHP generates webpages. PHP code and PHP methods are invisible to viewers: the viewer sees a pure HTML page consisting of whatever PHP generates, such as includes, database information, whatever.
To check this, look at the source code of your page (view>source), and that will be what google sees.
This is completely unlike javascript which is processed on the client's computer, and usually ignored by google (etc).
Note that the only thing you need to consider in this is how your URLs work. If you are using something like a form, google may not submit the form and find the new page. If you are using text only links, even with get variables (page.php?var=value), then google will read all of that.
In theory, it's best for SEO to have simple links like (dir/dir/dir/) rather than filenames (dir/dir/file.php) and especially not with variables (dir/file.php?var=value), but this is NOT a big problem and can be hard to avoid (though it is possible with enough work).
All of these methods will be indexed by google, though.
Basically if it is in the URL it will be indexed by google. The only exception is the # symbol before anchors which are client side links for (usually) the page to jump to a certain section-- but that's the same page anyway, so it's archived.
Beverleyh
04-08-2010, 07:48 PM
Thanks chaps.
I understand how php serves the html header/footer portions to visitors, so they get "complete" pages, but I wasnt sure how Google saw the jigsawed-off parts of a website - its good to know that the bots see the outputted php-to-html source code too.
Thanks for clearing that up for me.
bluewalrus
04-09-2010, 01:52 AM
To be more clear they dont see the php at all. They only see the html code.
dimitritetsch
04-09-2010, 06:31 PM
To be honest it is quite sure that the google crawlers only see the pages html. So let us imagine that I have the following webpage:
http://www.someurl.com
on this domain I have:
index.php which includes header.php and footer.php
how to I prevent the following appearances in google
http://www.someurl.com/footer.php and http://www.someurl.com/footer.php
because if those would appear in google as individual search results, it would look stupid for somebody who actually visits a page like http://www.someurl.com/footer.php => he would only see the footer as separate page.
Does anybody have any experience in avoiding this jigsaw?
BLiZZaRD
04-09-2010, 06:36 PM
<meta name="ROBOTS" content="NOINDEX,NOFOLLOW" >
make a <head> section in each of the separate pages, with that in there and Google and the other search engine bots wont index them nor follow any links they contain.
Just be sure if you are including those on other pages, you don't include the meta tags as well :)
djr33
04-09-2010, 06:50 PM
Google does not guess what pages exist. If you never link to footer.php it will not appear on google. You can also use page.inc to require that it cannot be loaded alone.
dimitritetsch
04-09-2010, 08:22 PM
blizzard / DJR,
Thanks for sharing the useful information
Allthough one more question what is more safe .inc (inc as final extension) or .inc.php ... in other words what if the inc as final extension is not recognized => would this echo my code as plain text?
bluewalrus
04-09-2010, 09:21 PM
Another solution i think would work would be
<?php
include ('page.php?include=yes');
?>
on page.php
<?php
if ($_GET['include']== "yes") {
insert code,
load content here
} else {
echo "Incorrect Page Address.";
}
?>
djr33
04-10-2010, 05:49 AM
That's a good question-- .inc or .inc.php. I'm not actually sure which is best, or if both even work. I'm pretty sure I've seen both, though, so you might want to look that up on google to see where they're referenced. I believe that either way you won't be able to access the page directly, so it should not matter, but just test this on the server.
RE bluewalrus:
I do not believe that you can use a query string in an include statement-- it refers to paths only-- real files. I've tried this before and it hasn't worked. (The exception would be for external links, and that would then probably work as part of the whole URL.) I'm not saying not to try it, but don't be surprised if it doesn't work (and if you DO get it to work, post back with details).
The easy way around this is to set another variable or what is more often seen in software is setting a constant: define('VALID',1); (in the main page), then if (is_defined('VALID')) { .... } in the included page. Or just a variable in the same way.
Powered by vBulletin® Version 4.2.2 Copyright © 2021 vBulletin Solutions, Inc. All rights reserved.