Optimized Custom Robots.txt for Blogger (A Complete Guide)

Optimized Custom Robots.txt for Blogger (A Complete Guide)
3.9
(28)

Every search engine bot first reads a website’s robots.txt file to understand its crawling rules. This makes the robots.txt file a key part of SEO for any Blogger blog. If your posts are not being indexed properly, setting up a custom robots.txt file can fix the problem and improve your blog’s visibility.

In this guide, I will explain what a robots.txt file is, why it is important for SEO, and how you can create a well-optimized custom robots.txt file for your Blogger website. I will also show you how to manage blocked pages reported in Google Search Console and help index your articles faster.

By following these steps, you can ensure that search engines crawl your content efficiently and boost the search performance of your Blogger blog.

What is a robots.txt file?

The robots.txt file provides information about which URLs in their database are allowed to be crawled by search engine crawlers or bots.

This mainly helps prevent your website from being overloaded with too many crawl requests and also saves server bandwidth.

Using this, you can block unnecessary pages from being crawled while allowing important pages, which helps save server resources.

The robots.txt file belongs to the Robots Exclusion Protocol (REP), a set of web standards that govern how robots or web crawlers browse the web, access and index content, and present that content to users.

Typically, the robots.txt file is placed in the root folder of a website and can be accessed using a URL like this:

https://example.com/robots.txt

This way, you can quickly check your Blogger site’s robots.txt file by adding robots.txt after your homepage URL, as shown in the example above.

Structure of default Robots.txt file

The standard format of a robots.txt file looks like this:

User-agent: [user-agent name]

Disallow: [URL string not to be crawled]

A single robots.txt file can contain multiple lines containing different user agents and directives (such as disallows, allows, crawl-delays, etc.).

There are five commonly used terms in a robots.txt file.

User-agent: This specifies the web crawler to which the directive applies (most often a search engine).

disallow: This directive tells the user-agent not to access a specific URL. Only one “Disallow:” line is allowed per URL.

allow: (Applies to Googlebot only) This directive allows Googlebot to access a page or subfolder, even if the parent page or folder is disallowed.

Crawl-delay: This tells the web crawler how many seconds to wait before fetching and crawling the next page. It helps reduce server load.

Sitemap: This command directs web crawlers to crawl the XML sitemap(s) associated with this URL. Google, Bing, Yahoo, and Ask support this command.
Comments: Any line that begins with “#” is a comment. Comments are ignored by crawlers, but they help humans understand the rules and document them. For example, # This comment explains the rule.

How to check robots.txt?

To check the contents of the robots.txt file, follow these steps:

Find the robots.txt file: The robots.txt file is usually placed in the root directory of the website you want to check. For example, if your website is www.example.com, you can find the file at www.example.com/robots.txt.

Access the file: Open a browser and type the full URL of the robots.txt file in the address bar. For example, www.example.com/robots.txt. This will display the contents of the file directly in your browser.

Review the file: Look carefully at the contents of the robots.txt file. It contains directives that guide web crawlers, such as search engine bots, which pages to crawl and which to block. The file follows a specific syntax and set of rules. Make sure the directives are written correctly and match your desired instructions for search engines.

Validate syntax: You can check the syntax of a file using online robots.txt validation tools. These tools will analyze the file and highlight any errors or issues. Some widely used validators include Google’s Robots.txt Tester, Bing Webmaster Tools, and other third-party platforms.

Test with a web crawler: Once the syntax is verified, you can test functionality using a web crawler or search engine bot simulator. These tools show how search engines interpret your robots.txt rules and which pages they can index. Popular options include Screaming Frog SEO Spider, SiteBulb, or NetPeak SEO Spider.

By following these steps, you can ensure that your robots.txt file is working properly, formatted correctly, and aligned with your instructions for search engine bots.

Default Robots.txt File for Blogger Blog

To improve SEO for Blogger blog, it is important to understand the CMS structure and review the default robots.txt file. Here is the default robots.txt file used in Blogger:

User-agent: Mediapartners-Google

Disallow: 

User-agent: *

Disallow: /search

Allow: /

Sitemap: https://www.example.com/sitemap.xml

The first line defines the type of bot. Here, it is Google AdSense, which is not restricted (the second line is empty). This means that AdSense ads can appear on all pages of the site.

The next section is for all other bots (*), which are not allowed to crawl/search pages. This prevents search and label pages (which share the same URL structure) from being indexed. The allow rule ensures that all pages except disallowed pages can be crawled.

The last line contains the Blogger post sitemap.

This default file works well for managing how search engine bots crawl your blog. However, it allows indexing of archive pages, which can lead to duplicate content issues. In such cases, it can create unnecessary pages for the Blogger site.

Optimizing Robots.txt for Blogger Blogs

After analyzing the default robots.txt, we can optimize it for better SEO performance.

The default setup allows indexing of archive pages, which can result in duplicate content. To fix this, /search* should be used to block all search and label pages from being crawled.

Adding the disallow rule /20* prevents crawling of archive sections. Since this can block all posts, we need an allow rule for /*.html to ensure that posts and pages can be crawled.

By default, a sitemap only includes posts, not pages. Therefore, you should add a sitemap of pages located at https://example.blogspot.com/sitemap-pages.xml or for a custom domain https://www.example.com/sitemap-pages.xml. Submitting these sitemaps to Google Search Console helps with indexing.

Here is a custom robots.txt optimized for a Blogger blog:

User-agent: Mediapartners-Google

Disallow: 

User-agent: *  # select all crawling bots and search engines

Disallow: /search* # block all user-generated query pages

Disallow: /20*  # prevent crawling of Blogger archive sections

Disallow: /feeds*  # stop feeds from being crawled

Allow: /*.html  # allow all posts and pages to be crawled

# Sitemap of the blog

Sitemap: https://www.example.com/sitemap.xml

Sitemap: https://www.example.com/sitemap-pages.xml

  • /search* Prevents search and label pages from being crawled.
  • /20* Prevents archive sections from being crawled.
  • Disallow: /feeds* Blocks feed URLs. Use this only if you haven’t created a new Blogger XML sitemap.
  • Allow: /*.html Ensures that all posts and pages are accessible to search engines.

How useful was this post?

Click on a star to rate it!

Average rating 3.9 / 5. Vote count: 28

No votes so far! Be the first to rate this post.

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *