Results 1 to 5 of 5

Thread: Mark Computers that come from Google

  1. #1
    Join Date
    Nov 2009
    Posts
    107
    Thanks
    7
    Thanked 2 Times in 2 Posts

    Default Mark Computers that come from Google

    Is it possible to trap for computers coming to my site off of a Google search? I work for an estore and google caches their sold out items. This causes a customer to go to either the parent category or get a 404 error. So like I said, I would like to trap this kind of incoming traffic and redirect it to a predetermined sold out product page. Any ideas would be appreciated, or some reading on the subject.

    Thanks

  2. #2
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,162
    Thanks
    263
    Thanked 690 Times in 678 Posts

    Default

    You would need to redesign your system. You can't redirect from a 404 because no page exists, unless you want to redirect ALL 404s to some other page (like your home page).

    You could use .htaccess as a very general way to redirect, but I don't understand what the rule would be-- you don't want all traffic from google to be redirected, and you don't want all 404 pages to be redirected.

    What I suggest would be using a serverside language like PHP to create a dynamic 404 page-- then you can use that page to display several possible messages, like "404" or "Sold out". Or you could redirect them.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  3. #3
    Join Date
    Apr 2008
    Location
    So.Cal
    Posts
    3,643
    Thanks
    63
    Thanked 517 Times in 503 Posts
    Blog Entries
    5

    Default

    Another approach would be to set the proper headers on your product pages, so Google knows not to cache them in the first place. This would require a server-side language (like PHP), or you could set them via .htaccess is your server has mod_headers enabled.

    This might take a while to have any noticeable effect (i.e., it wouldn't change what Google has cached now; only what they do in the future), but it would definitely be the preferred solution. Combining this with a dynamic 404/"sold out" page, as Daniel suggests, would be most effective.

  4. #4
    Join Date
    Nov 2009
    Posts
    107
    Thanks
    7
    Thanked 2 Times in 2 Posts

    Default

    Thanks a lot for the answers guys. It is appreciated. I will give what you said a try and see what I come up with. I will probably be back soon

  5. #5
    Join Date
    Apr 2012
    Posts
    7
    Thanks
    1
    Thanked 2 Times in 2 Posts

    Default

    Here is some code that I have used in the past to check for search engines and it worked for me. It is in asp and some items may need to get updated.. Please use responsibly!

    Code:
    <% 
    'CHECKS TO SEE IF THE VISITOR IS A ROBOT!
    'SEARCH ENGINE CLOAKING IS A VIOLATION OF ALL SEARCH ENGINES TERMS OF USE AND MAY HAVE SEVERE CONSEQUENCES INCLUDING TOTAL BANNING FROM ENGINES.
    'THIS SHOULD ONLY BE USED TO HIDE UNDESIRABLE CONTENT FROM SEARCH ENGINE BOTS.
    'Detect robots and if the bot is in the object dictionary, the session "IsRobot" is set to "Yes" - "No" means it is not.
    'This enables some sort of selective content serving based on whether the visitor is a robot or not.
    'Content on pages can be hidden or shown depending on the value of the session variable "IsRobot".
    Sub AddViolation(objDict, strWord)
    'Adds a violation (a robot in this case)
    objDict.Add strWord, False
    End Sub
    Function CheckStringForViolations(strString, objDict)
    'Determines if the string strString has any violations
    Dim bolViolations 
    bolViolations = False 
    Dim strKey 
    For Each strKey in objDict 
    If InStr(1, strString, strKey, vbTextCompare) > 0  Then
    bolViolations = True 
    objDict(strKey) = True 
    End If 
    Next 
    CheckStringForViolations = bolViolations 
    End Function 
    'BEGIN OBJECT DICTIONARY VIOLATIONS
    'This list will need to get updated every now and then.
    'To test, use FireFox Useragent Extension to change your HTTP_USER_AGENT when visiting the website.
    'Another way to test is to look for cached pages (text) on search engines.
    Dim objDictViolations 
    Set objDictViolations = Server.CreateObject("Scripting.Dictionary") 
    AddViolation objDictViolations, "Alexa" 'Alexa www.alexa.com 
    AddViolation objDictViolations, "ia_archiver" 'Alexa www.alexa.com
    AddViolation objDictViolations, "MSNBot" 'MSN www.msn.com
    AddViolation objDictViolations, "Yahoo! Slurp" 'Yahoo www.yahoo.com
    AddViolation objDictViolations, "Yahoo Slurp" 'Yahoo www.yahoo.com
    AddViolation objDictViolations, "GoogleBot" 'Google www.google.com
    AddViolation objDictViolations, "GoogleBot Cloak"  'Google - Confirmed! www.google.com
    AddViolation objDictViolations, "Lycos" ' Lycos www.lycos.com
    AddViolation objDictViolations, "Ultraseek" 'Ultraseek www.infoseek.com
    AddViolation objDictViolations, "Sidewinder" 'Ultraseek www.infoseek.com
    AddViolation objDictViolations, "InfoSeek" 'Ultraseek www.infoseek.com
    AddViolation objDictViolations, "Scooter" 'UltaVista www.ultavista.com
    AddViolation objDictViolations, "InfoSeek sidewinder" 'UltaVista www.ultavista.com
    AddViolation objDictViolations, "FAST-WebCrawler" 'alltheweb www.alltheweb.com
    AddViolation objDictViolations, "ArchitextSpider" 'excite www.excite.com
    AddViolation objDictViolations, "Lycos_Spider_(T-Rex)" 'lycos www.lycos.com
    AddViolation objDictViolations, "Fatbot" 'http://www.thefind.com/main/CrawlerFAQs.fhtml
    AddViolation objDictViolations, "Fatbot 2.0" 'http://www.thefind.com/main/CrawlerFAQs.fhtml
    AddViolation objDictViolations, "twiceler" 'http://www.cuil.com/
    AddViolation objDictViolations, "Yandex" 'http://www.Yandex.ru/
    AddViolation objDictViolations, "Baidu Spider" 'http://www.baidu.com/
    AddViolation objDictViolations, "iPhone" 'http://www.apple.com/ - Not a robot but also treated as one
    Dim strCheck, strKey 
    strCheck = Request.ServerVariables("HTTP_USER_AGENT") 
    If Len(strCheck) > 0 Then 
    If CheckStringForViolations(strCheck, objDictViolations) Then 
    Session("IsRobot") = "Yes"
    Else 
    Session("IsRobot") = "No"
    End If 
    End If 
    %>
    
    <%
    'THE FOLLOWING IS OPTIONAL IF THE ROUTINE ABOVE DOES NOT WORK WITH CERTAIN USER AGENTS
    'THIS WAS MADE TO DETECT BOTS THAT DO NOT BROADCAST THEIR USER AGENTS
    Dim strRemoteIP
    Dim strParsedIP
    strRemoteIP = Request.ServerVariables("HTTP_X_FORWARDED_FOR")
    If strRemoteIP = "" Then 
    strRemoteIP = Request.ServerVariables("REMOTE_ADDR")
    End If
    'THIS WILL GRAB THE 10 LEFT-MOST CHARACTERS OF GOOGLE BOT 66.249.66.
    strParsedIP = Left(strRemoteIP, 10)
    strDesiredIPRange = "66.249.66." 
    If strParsedIP = strDesiredIPRange Then
    Session("IsRobot") = "Yes"
    End If
    %>
    
    <% 
    'THIS WILL GRAB THE 10 LEFT-MOST CHARACTERS OF GOOGLE BOT 66.249.65.
    strDesiredIPRange = "66.249.65." 
    If strParsedIP = strDesiredIPRange Then
    Session("IsRobot") = "Yes"
    End If
    %>
    Last edited by ddadmin; 04-24-2012 at 10:04 PM.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •