[TriLUG] MSN bot is pounding my website...

Robert Ryals rryals at tmio.com
Thu Dec 9 15:14:19 EST 2004


gregbrown at mindspring.com wrote:

>The following is the number of hits from MSN bot, from all MSN bot IP addresses, to my webserver (through ALL historical logs I still have around):
>
>   1227 65.54.188.69
>     58 65.54.188.70
>     42 65.54.188.64
>     18 65.54.188.68
>      4 65.54.188.67
>
>
>If I look at all traffic to my website MSN bot is still on top
>
>   1227 65.54.188.69
>    127 192.58.204.226
>     59 65.54.188.70
>     42 65.54.188.64
>     29 64.244.30.79
>     24 66.196.91.227
>     19 65.87.170.103
>     19 129.33.49.251
>     18 65.54.188.68
>     17 66.26.93.162
>
>
>I know it's from MSN because it leaves the following in my log:
>"msnbot/0.3 (+http://search.msn.com/msnbot.htm)"
>
>I assume over at MSN they are trying to scrape the Internet to build up their own web search engine.  I am curious if others are seeing this same activity.
>
>The command I used for these queries was (as root in /var/log/httpd):
>
>for msn bot
>cat access_log| grep msnbot |  awk '{ print $1 }' | sort | uniq -c | sort -gr | head
>
>and
>
>for all hits
>cat access_log| awk '{ print $1 }' | sort | uniq -c | sort -gr | head
>
>Greg
>  
>
You can prevent this by adding a few lines to your apache config file.

<Directory /var/www/htdocs>
    SetEnvIfNoCase User-Agent "msnbot" bad_bot
    Deny from env=bad_bot
</Directory>





More information about the TriLUG mailing list