Hello! Today, what should be the correct robots.txt for WordPress . About what robots.txt is and what it is eaten with, I already wrote two days ago. And now specifically for WordPress. This file has the ability to set basic rules for indexing a blog for various search engines, as well as apply different access rights for individual search bots.
For example, I will explain how to create the right robots.txt for WordPress. The basis will take the two main search engines – Yandex and Google. I want to note that Yandex prefers when it is addressed separately and the User-agent directive will help us with this. Bots read the contents of the file (as well as the source code of any page) from top to bottom, so the User-agent should be the first line.
one | User-agent: * |
– if you put an asterisk in front of the directive, then all subsequent rules will apply to any robot. You can write separately the rules for the required bots, for example, for google, the line will look like this:
one | User-agent: yandex |
Let’s remember that WordPress , like any content management system (CMS), has its own administrative resources, administration folders, etc., which should not be included in the index. To protect such pages, which may contain personal data, various logins and passwords, it is necessary to prohibit their indexing in this file in the following lines:
one 2 3 | Disallow: / cgi-bin Disallow: / wp-admin / Disallow: / wp-includes / |
Theme files, plugins, and WordPress cache are also hardly needed, we apply the corresponding rules to them:
one 2 3 | Disallow: / wp-content / plugins Disallow: / wp-content / cache Disallow: / wp-content / themes |
The next rule for writing the correct robots file is not to allow the index, and then the search results, such pages that duplicate the main content, thereby reducing the uniqueness of the content within the same domain.
You should get rid of such pages as soon as possible, otherwise there is a chance of getting under the filter. Where on the blog Wordpress goes duplication? First of all, these are tags, comment pages, rss feeds of comments, entries by various authors of the blog (even if it is one – there is still duplication on the page / author / name of the author /, etc.
).
one 2 3 four five 6 7 eight 9 ten eleven | Disallow: / wp-trackback Disallow: / wp-feed Disallow: / wp-comments Disallow: / category / Disallow: / author / Disallow: / page / Disallow: / tag / Disallow: / feed / Disallow: * / feed Disallow: * / trackback Disallow: * / comments |
Further, I would like to pay attention to one aspect … If human-readable links are used on your blog , then the pages containing question marks in their URLs are often “redundant” and very often duplicate the main content. Therefore, they should also be prohibited:
one 2 3 | Disallow: / *? Disallow: / *? * Disallow: /*.php |
Please note that separate files with the .php extension are also prohibited, this is due to the fact that the same main page is accessible at several addresses and one of them is /index.php. This ban also includes administration files – install.php, login.php and others.
This doesn’t end the editing of the robots:!:. It is possible to register additional information data that improves the quality of indexing. Among them, the Host directive – sets the main mirror (this directive is only taken into account by google, naturally list your blog address):
one | Host: fastandsocial.com |
To speed up and complete indexing of all pages, we will add a path to the sitemap site map (write your address, for example give my own):
one | Sitemap: https://www.alert2web.com/sitemap.xml |
Based on all of the above, I got the following picture:
one 2 3 four five 6 7 eight 9 ten eleven 12 13 14 15 sixteen 17 18 nineteen 20 21 22 23 24 25 26 27 28 29 thirty 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | User-agent: * Disallow: / cgi-bin Disallow: / wp-admin / Disallow: / wp-includes / Disallow: / wp-content / plugins / Disallow: / wp-content / cache / Disallow: / wp-content / themes / Disallow: / wp-trackback Disallow: / wp-feed Disallow: / wp-comments Disallow: / category / Disallow: / author / Disallow: / page / Disallow: / tag / Disallow: / feed / Disallow: * / feed Disallow: * / trackback Disallow: * / comments Disallow: / *? Disallow: / *? * Disallow: /*.php User-agent: Yandex Disallow: / cgi-bin Disallow: / wp-admin / Disallow: / wp-includes / Disallow: / wp-content / plugins / Disallow: / wp-content / cache / Disallow: / wp-content / themes / Disallow: / wp-trackback Disallow: / wp-feed Disallow: / wp-comments Disallow: / category / Disallow: / author / Disallow: / page / Disallow: / tag / Disallow: / feed / Disallow: * / feed Disallow: * / trackback Disallow: * / comments Disallow: / *? Disallow: / *? * Disallow: /*.php Host: fastandsocial.com Sitemap: https://www.alert2web.com/sitemap.xml |
Remember: the indexing process should be monitored constantly and in time to make its own adjustments in relation to the robots.txt file for WordPress and not only.