There is an extension for Firefox and other Gecko based browsers that requests robots.txt. I do not rememberer the name of the extension off hand but I remember actively trying to contact the creator unsuccessfully. This is very obnoxious as it makes it difficult to detect new spiders. To detect new spiders I have to manually delete the line from the access log, delete the script's log, run the script again, and repeat for every single request! To negate this I want all requests from useragents with the string 'Gecko' to receive an HTTP 403 error code when requesting the robots.txt file (as this will not be counted as a successful request on the file).
I do not have the ability to execute PHP in txt files nor do I have access to httpd.conf to allow this. So...
1.) How do we detect 'Gecko'?
2.) How do we forbid 'Gecko' from accessing robots.txt?
I've been searching and this is my current best guess though it generates a server error (Apache 1.3.39).
Here are some examples that I've tried but none of these work! I also don't understand Apache's syntax very well as their documentation could use more examples...lots more.
This works too well, can anyone reform this to use a rule to only apply this to robots.txt file only?RewriteCond %{HTTP_USER_AGENT}!Gecko [NC]
RewriteRule!^(robots\.txt) - [F]
SetEnvIf User-Agent "Gecko" Gecko
<Files /error/error-403.php>
order allow,deny
allow from all
</Files>
deny from env=Gecko




Reply With Quote
Bookmarks