View Full Version : Mark Computers that come from Google
itivae
04-10-2012, 07:59 PM
Is it possible to trap for computers coming to my site off of a Google search? I work for an estore and google caches their sold out items. This causes a customer to go to either the parent category or get a 404 error. So like I said, I would like to trap this kind of incoming traffic and redirect it to a predetermined sold out product page. Any ideas would be appreciated, or some reading on the subject.
Thanks
djr33
04-10-2012, 09:52 PM
You would need to redesign your system. You can't redirect from a 404 because no page exists, unless you want to redirect ALL 404s to some other page (like your home page).
You could use .htaccess as a very general way to redirect, but I don't understand what the rule would be-- you don't want all traffic from google to be redirected, and you don't want all 404 pages to be redirected.
What I suggest would be using a serverside language like PHP to create a dynamic 404 page-- then you can use that page to display several possible messages, like "404" or "Sold out". Or you could redirect them.
Another approach would be to set the proper headers on your product pages, so Google knows not to cache them in the first place. This would require a server-side language (like PHP), or you could set them via .htaccess is your server has mod_headers (http://httpd.apache.org/docs/2.0/mod/mod_headers.html) enabled.
This might take a while to have any noticeable effect (i.e., it wouldn't change what Google has cached now; only what they do in the future), but it would definitely be the preferred solution. Combining this with a dynamic 404/"sold out" page, as Daniel suggests, would be most effective.
itivae
04-13-2012, 01:51 AM
Thanks a lot for the answers guys. It is appreciated. I will give what you said a try and see what I come up with. I will probably be back soon :)
hamada76
04-24-2012, 08:03 PM
Here is some code that I have used in the past to check for search engines and it worked for me. It is in asp and some items may need to get updated.. Please use responsibly!
<%
'CHECKS TO SEE IF THE VISITOR IS A ROBOT!
'SEARCH ENGINE CLOAKING IS A VIOLATION OF ALL SEARCH ENGINES TERMS OF USE AND MAY HAVE SEVERE CONSEQUENCES INCLUDING TOTAL BANNING FROM ENGINES.
'THIS SHOULD ONLY BE USED TO HIDE UNDESIRABLE CONTENT FROM SEARCH ENGINE BOTS.
'Detect robots and if the bot is in the object dictionary, the session "IsRobot" is set to "Yes" - "No" means it is not.
'This enables some sort of selective content serving based on whether the visitor is a robot or not.
'Content on pages can be hidden or shown depending on the value of the session variable "IsRobot".
Sub AddViolation(objDict, strWord)
'Adds a violation (a robot in this case)
objDict.Add strWord, False
End Sub
Function CheckStringForViolations(strString, objDict)
'Determines if the string strString has any violations
Dim bolViolations
bolViolations = False
Dim strKey
For Each strKey in objDict
If InStr(1, strString, strKey, vbTextCompare) > 0 Then
bolViolations = True
objDict(strKey) = True
End If
Next
CheckStringForViolations = bolViolations
End Function
'BEGIN OBJECT DICTIONARY VIOLATIONS
'This list will need to get updated every now and then.
'To test, use FireFox Useragent Extension to change your HTTP_USER_AGENT when visiting the website.
'Another way to test is to look for cached pages (text) on search engines.
Dim objDictViolations
Set objDictViolations = Server.CreateObject("Scripting.Dictionary")
AddViolation objDictViolations, "Alexa" 'Alexa www.alexa.com
AddViolation objDictViolations, "ia_archiver" 'Alexa www.alexa.com
AddViolation objDictViolations, "MSNBot" 'MSN www.msn.com
AddViolation objDictViolations, "Yahoo! Slurp" 'Yahoo www.yahoo.com
AddViolation objDictViolations, "Yahoo Slurp" 'Yahoo www.yahoo.com
AddViolation objDictViolations, "GoogleBot" 'Google www.google.com
AddViolation objDictViolations, "GoogleBot Cloak" 'Google - Confirmed! www.google.com
AddViolation objDictViolations, "Lycos" ' Lycos www.lycos.com
AddViolation objDictViolations, "Ultraseek" 'Ultraseek www.infoseek.com
AddViolation objDictViolations, "Sidewinder" 'Ultraseek www.infoseek.com
AddViolation objDictViolations, "InfoSeek" 'Ultraseek www.infoseek.com
AddViolation objDictViolations, "Scooter" 'UltaVista www.ultavista.com
AddViolation objDictViolations, "InfoSeek sidewinder" 'UltaVista www.ultavista.com
AddViolation objDictViolations, "FAST-WebCrawler" 'alltheweb www.alltheweb.com
AddViolation objDictViolations, "ArchitextSpider" 'excite www.excite.com
AddViolation objDictViolations, "Lycos_Spider_(T-Rex)" 'lycos www.lycos.com
AddViolation objDictViolations, "Fatbot" 'http://www.thefind.com/main/CrawlerFAQs.fhtml
AddViolation objDictViolations, "Fatbot 2.0" 'http://www.thefind.com/main/CrawlerFAQs.fhtml
AddViolation objDictViolations, "twiceler" 'http://www.cuil.com/
AddViolation objDictViolations, "Yandex" 'http://www.Yandex.ru/
AddViolation objDictViolations, "Baidu Spider" 'http://www.baidu.com/
AddViolation objDictViolations, "iPhone" 'http://www.apple.com/ - Not a robot but also treated as one
Dim strCheck, strKey
strCheck = Request.ServerVariables("HTTP_USER_AGENT")
If Len(strCheck) > 0 Then
If CheckStringForViolations(strCheck, objDictViolations) Then
Session("IsRobot") = "Yes"
Else
Session("IsRobot") = "No"
End If
End If
%>
<%
'THE FOLLOWING IS OPTIONAL IF THE ROUTINE ABOVE DOES NOT WORK WITH CERTAIN USER AGENTS
'THIS WAS MADE TO DETECT BOTS THAT DO NOT BROADCAST THEIR USER AGENTS
Dim strRemoteIP
Dim strParsedIP
strRemoteIP = Request.ServerVariables("HTTP_X_FORWARDED_FOR")
If strRemoteIP = "" Then
strRemoteIP = Request.ServerVariables("REMOTE_ADDR")
End If
'THIS WILL GRAB THE 10 LEFT-MOST CHARACTERS OF GOOGLE BOT 66.249.66.
strParsedIP = Left(strRemoteIP, 10)
strDesiredIPRange = "66.249.66."
If strParsedIP = strDesiredIPRange Then
Session("IsRobot") = "Yes"
End If
%>
<%
'THIS WILL GRAB THE 10 LEFT-MOST CHARACTERS OF GOOGLE BOT 66.249.65.
strDesiredIPRange = "66.249.65."
If strParsedIP = strDesiredIPRange Then
Session("IsRobot") = "Yes"
End If
%>
Powered by vBulletin® Version 4.2.2 Copyright © 2021 vBulletin Solutions, Inc. All rights reserved.