Google has proposed introducing a new Internet standard when it comes to rules related to robots.txt files.
These rules, enshrined in the Robots Exclusion Protocol (REP), have been an unofficial standard for the past 25 years.
Although the REP has been adopted by search engines, it is not yet official, which means that it is open to interpretation by developers. Further, it has never been updated to cover today’s use cases.
According to Google, this creates a challenge for website owners because the protocol is ambiguously written, and has made it difficult to write rules correctly.
New rules for robots.txt
To eliminate this challenge, Google has documented how REP is being used on the modern web and submitted it to the Internet Engineering Task Force (IETF) for consideration.
Google explains what’s included in the draft:
“The proposed draft REP reflects over 20 years of real-world experience relying on robots.txt rules, which are also used by Googlebot and other major search engines, as well as about half a billion sites that rely on REP. These modified controls give the publisher the ability to decide what they would like to be indexed on their site and potentially displayed to interested users. “
The draft does not change any of the rules established in 1994, it has only been updated for the modern web.
Some of the updated policies include:
- Any URI transfer protocol can use robots.txt. It is no longer limited to HTTP. It can also be used for FTP or CoAP.
- Programmers must parse at least the first 500 kibibytes of the txt file.
- The new maximum cache time is 24 hours or controlled data caching if available, giving website owners the flexibility to update their txt whenever they want.
- When a txt file becomes unavailable due to server errors, known unauthorized pages are not indexed for a reasonably long period of time.
Google is fully open to feedback on the proposed draft and says it’s committed to getting it done the right way.
Text taken from: www.searchenginejournal.com
Author: Matt Southern
Made by Nebojša Radovanović – SEO Expert @Digitizer
