One of the most boring topics in technical SEO is robots.txt. Rarely is there an interesting problem needing to be solved in the file, and most errors come from not understanding the directives or from typos. The general purpose of a robots.txt file is simply to suggest to crawlers where they can and cannot go. Basic parts of the robots.txt file User-agent — specifies which robot. Disallow — suggests the robots not crawl this area. Allow — allows robots to crawl this area. Crawl-delay — tells robots to wait a certain number of seconds before continuing the crawl. Sitemap — specifies the sitemap location. Noindex — tells Google to remove pages from the index. # — comments out a line so it will not be read. * — match any text. $ — the URL must end here. Other things you should know about robots.txt Robots.txt must be in the main folder, i.e., http://ift.tt/1jLiEV5. Each subdomain needs its own robots.txt — http://ift.tt/1sMWKqc is not the same as http://ift.tt/1jLiEV5. Crawlers can ignore robots.txt. URLs and Search Engine Land Source
The post Fun with robots.txt appeared first on ocston.org -- Great Info about SEO.
No comments:
Post a Comment