logo
Apache Lounge
Webmasters

 


About

Forum Index Downloads Search Register Log in  RSS Apache Lounge
 


Keep Server Online

If you find the Apache Lounge, the downloads and overall help useful, please express your satisfaction with a donation.

or

Bitcoin

A donation makes a contribution towards the costs, the time and effort that's going in this site and building.

Thank You! Steffen

Apache Lounge is not sponsored.

Your donations will help to keep this site alive and well, and continuing building binaries.




How to stop HTTrack mass catch site?

 
Post new topic   Reply to topic    Apache Forum Index -> Apache third-party Modules



View previous topic :: View next topic  
Author Message
maskego



Joined: 16 Apr 2010
Posts: 238

PostPosted: Thu 13 Oct '11 11:56    Post subject: How to stop HTTrack mass catch site? Reply with quote

Is there any good module can stop machine tools like HTTrack,telport...etc?Catch machines make site has heavy loading.It's so disgusting.

I use mod_rewrite,but can't stop that.It's because machine catch tools can rename the user-agent name.

Mad
Back to top
James Blond
Moderator


Joined: 19 Jan 2006
Posts: 6352
Location: Germany, Next to Hamburg

PostPosted: Thu 13 Oct '11 14:18    Post subject: Reply with quote

That is easy. Example form the docs

Code:

SetEnvIf User-Agent BadBot GoAway=1
Order allow,deny
Allow from all
Deny from env=GoAway


http://httpd.apache.org/docs/2.2/howto/access.html#env

or with rewrite
Code:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} HTTrack
RewriteRule ^.* - [F]


See also

http://perishablepress.com/press/2007/10/15/ultimate-htaccess-blacklist-2-compressed-version/

if you still have a question please ask again.
Back to top
maskego



Joined: 16 Apr 2010
Posts: 238

PostPosted: Fri 14 Oct '11 2:03    Post subject: Reply with quote

james:

But,that direction site shows note:
Code:
Warning:

Access control by User-Agent is an unreliable technique, since the User-Agent header can be set to anything at all, at the whim of the end user.


If that,how can I stop all catch machine tools to site?Is there any good idea to stop it? Rolling Eyes
Back to top
James Blond
Moderator


Joined: 19 Jan 2006
Posts: 6352
Location: Germany, Next to Hamburg

PostPosted: Fri 14 Oct '11 11:12    Post subject: Reply with quote

Ya, it is not fully reliable. But it keeps some traffic out. Much better is to search the log files for it, collect the IPs and ban those IPs in your firewall that they don't reach apache at all.
Back to top
maskego



Joined: 16 Apr 2010
Posts: 238

PostPosted: Fri 14 Oct '11 11:19    Post subject: Reply with quote

Sure,it's not reliable at all.

Is it possible to use some modules of apache to prevent web catch behavior from normal visit sites?(use modules to ban their behavior rather than ban their user-agent)



James Blond wrote:
Ya, it is not fully reliable. But it keeps some traffic out. Much better is to search the log files for it, collect the IPs and ban those IPs in your firewall that they don't reach apache at all.
Back to top
James Blond
Moderator


Joined: 19 Jan 2006
Posts: 6352
Location: Germany, Next to Hamburg

PostPosted: Fri 14 Oct '11 11:34    Post subject: Reply with quote

There is might a combinaton of Mod Limit IP Connection and Mod Bandwidth. At least Mod Limit IP Connection (can be found on apachehaus.com) can be very useful in your situation.
Back to top
maskego



Joined: 16 Apr 2010
Posts: 238

PostPosted: Sat 15 Oct '11 6:35    Post subject: Reply with quote

Does mod_security can stop HTTrack webcopier tools behavior not user-agent?
And,how to set it? Rolling Eyes
Back to top


Post new topic   Reply to topic    Apache Forum Index -> Apache third-party Modules
Page 1 of 1