| |
Submit a Site
Remove a URL
Request a Review
Robots.txt Instructions
|
Our crawler attempts to provide as thorough an index of pictures on the web as
possible. It will find any URL connected to the main body of the web through
even one link. The Ditto Spyder honors the Robots Exclusion Standard.
Excluding specific pages, or your entire site, from the Ditto picture index is
as simple as setting up a robots.txt file. It only takes a minute, and gives
you complete control over how much of your site is indexed. If you wish your
site not to be indexed by our crawler you will need to take advantage of the
Robots Exclusion Standard. More information on Robots Exclusion Standard can
be found at:
http://info.webcrawler.com/mak/projects/robots/robots.html.
The robots.txt file allows for the exclusion of all crawlers or specific
crawlers. This method should be used if you have access to the root directory
of a web site and know specific directories you want to exclude. The
robots.txt file MUST be located in the root directory of a given web site. To
exclude all robots from the entire site, the contents of the robots.txt file
would be:
# anything on a line after a # sign is ignored
User-agent: * # This excludes all crawlers (any text after the # sign is ignored)
Disallow: /
To exclude only the Ditto Spyder from the entire site, the contents of the
robots.txt file would be:
User-agent: DittoSpyder
Disallow: /
Alternatively, if you want to exclude the Ditto Spyder from specific
directories, you could add a "Disallow" line for each directory you do not
want indexed.
User-agent: DittoSpyder # Ditto Image Search
Disallow: /personal
Disallow: /images
Disallow: /bar
For more information on robots exclusion visit:
http://info.webcrawler.com/mak/projects/robots/robots.html.
|
|