-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Friday 2008-06-06 at 13:37 +0100, David Bolt wrote:
On Fri, 6 Jun 2008, Carlos E. R. wrote:-
Well, if the other side is interested in getting info they shouldn't, they can ignore the robot.txt file and do the scan slowly, so as not to be so intrusive ;-)
One interesting little experiment I've yet to try is to add a deny entry to the robots.txt for a sub-directory that has no links from anywhere else, and then to see which robots actually try indexing it. It might even be fun to build another page detailing which IP addresses visited these hidden locations.
Quite interesting! You could deny them any access except to a page explaining they broke policy, or directly denied in the firewall.
Even wget has options for that. The server thinks it is normal traffic, unless they do analysis on the pattern.
And with the use of --wait and --random-wait you can (virtually?) eliminate the patterns. By setting the wait time to a minute, wget will wait anywhere upto two minutes between successive fetches. The full details, and an explanation of why --random-wait exists is in the wget man page.
Yep. It aroused my curiosity when I read that man page time ago. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4-svn0 (GNU/Linux) iD8DBQFISVU+tTMYHG2NR9URAqEoAJ4kNG9YvCV+TH4gJOEjjz5Na2p4cACfX/OJ J/YqaF88927CMQ+pUStWEzk= =lXCn -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org