Results 1 to 2 of 2

Thread: robots.txt and .htaccess

  1. #1
    Join Date
    Jul 2006
    Location
    Antwerp, Belgium (Europe)
    Posts
    923
    Thanks
    121
    Thanked 2 Times in 2 Posts

    Default robots.txt and .htaccess

    A friend told me I could protect my site from abuse by adding the following code to my .htaccess. Is this really helpfull ?
    Code:
    ErrorDocument 404 http://www.cecicasariego.com
    
    RewriteEngine On 
    RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] 
    RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] 
    RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR] 
    RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^Zeus 
    RewriteRule ^.* - [F,L]
    Also, I would like any spider to stop from indexing any image in the filemap /images (to stop people copying my images), so I added the following in my robots.txt. But it doesn't seem to work, as I found images on google. How to correct this ? What can I expect from this ?
    Code:
    User-agent: * 
    Disallow: /images/

  2. #2
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    30,495
    Thanks
    82
    Thanked 3,449 Times in 3,410 Posts
    Blog Entries
    12

    Default

    I'm going to answer the second question first. It could be that the images were indexed by Google before you instituted the ban in robots.txt, or maybe you are doing it wrong, or maybe Google isn't reading the file, that would be their prerogative - check on the Google site for information about how to exclude images from Google Images. They do state that the images they return may be copyrighted, that covers them from most liability. However, there is little point in trying to protect images. Someone, more importantly, those people most interested in exploiting your images illegally will find and copy them no matter what you do. The best policy is to not publish images that you don't wish others to copy. Nearly as good (depending upon the exact situation and what you are trying to prevent) is to only publish lower resolution versions.

    As to the first question, if the .htaccess file is set up properly to prevent access from those sites/domains, and those sites/domains are known to be nefarious, it is a good idea.

    Ultimately though, one must distinguish between harmless nuisance and real harm. Often something you might find annoying actually does no harm and may help to boost traffic. You as site master must investigate further to determine this. I've never found it important to block anything from sites I master.
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

  3. The Following User Says Thank You to jscheuer1 For This Useful Post:

    chechu (08-10-2008)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •