On Fri, 6 Jun 2008, Carlos E. R. wrote:-
Well, if the other side is interested in getting info they shouldn't, they can ignore the robot.txt file and do the scan slowly, so as not to be so intrusive ;-)
One interesting little experiment I've yet to try is to add a deny entry to the robots.txt for a sub-directory that has no links from anywhere else, and then to see which robots actually try indexing it. It might even be fun to build another page detailing which IP addresses visited these hidden locations.
Even wget has options for that. The server thinks it is normal traffic, unless they do analysis on the pattern.
And with the use of --wait and --random-wait you can (virtually?) eliminate the patterns. By setting the wait time to a minute, wget will wait anywhere upto two minutes between successive fetches. The full details, and an explanation of why --random-wait exists is in the wget man page. Regards, David Bolt -- Team Acorn: http://www.distributed.net/ OGR-P2 @ ~100Mnodes RC5-72 @ ~15Mkeys SUSE 10.1 32bit | | openSUSE 10.3 32bit | openSUSE 11.0RC1 SUSE 10.1 64bit | openSUSE 10.2 64bit | openSUSE 10.3 64bit RISC OS 3.6 | TOS 4.02 | openSUSE 10.3 PPC | RISC OS 3.11 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org