The RescueTheWeb Crawling Robot

RescueTheWeb uses a robot to find infections, vulnerabilities and leaks on the web. This robot generally abides by the robots.txt file. However, if an infection is found that links to other sites (which can happen in link spam related infections) it will follow the link to the other sites in an effort to find more breached websites. Typically, the links on breached websites point to other breached websites.
We are sorry that we are not always able to follow the robots.txt exclusions, however we must follow the trail of the perpetrators in order to uncover the infections. The perpetrators did not consider your robots.txt file while breaching your system, so we have to unfortunately follow suite.

How does the robot identify itself?

The RescueTheWeb robot uses this user agent string:
Mozilla/5.0 (compatible; RescueTheWeb; http://www.rescuetheweb.org/; )

How often does the robot access my pages?

The robots only access your pages to verify the presence of an issue. There is a throttle in the robot system that ensures we will not access your page more than once a day. Early versions of the robot had a bug in which they accessed some pages more often. However, that issue has been resolved.

Does the robot crawl my site?

No, the robot only accesses page that have been referred by some other source as suspicious.

What is done with the page content retrieved?

We use our analysis engine to inspect the robots results for an indication of a vulnerability, infection or leak. If we find something interesting then we store a copy of the issue (which we found on your website) into our Security Notice and send you a copy. All information we retrieve is kept private to us and you.