Robots.txt Best Practices for eCommerce SEO

eCommerce stores are complex and made up of a range of important files and web pages, each serving a different, but vital purpose.

An incredibly simple but crucial example is the Robots.txt file. This seemingly innocuous file might look like a jumble of phrases, but in eCommerce, it serves a purpose and is something that should always be configured ASAP when creating new types of Ecommerce websites.

In this guide, I explain the purpose of the Robots.txt file, how it benefits your eCommerce SEO, and some simple tips to get you started.

What is the Robots.txt File?

Robots.txt is a plain text file (you can create it in Notepad if you want!) that is placed in the root directory of your eCommerce store. It is a simple set of instructions that tell search engine crawler bots which web pages they shouldn’t try and crawl. The aim is to improve crawler access, regulate the crawling rate, and improve SEO.

Why is Optimizing the Robots.txt File Important for eCommerce SEO?

You may wonder what’s the fuss about dictating which web pages a bot crawls, surely it doesn’t matter? But it does! Optimizing your Robots.txt file is especially critical and offers the following benefits for all ecommerce SEO packages:

Manage search engine crawl budget.
Stop time wasted crawling non-SEO pages.
Make sure SEO-rich pages are targeted for crawling.

The main benefit is that you help manage a search engine crawler’s budget. Each search engine crawler typically allows a set amount of resources to crawl a website based on things like its reputation, size, and authority which constitutes the budget.

By optimizing your Robots.txt file, you make sure that the crawl budget is effectively used and that there are no wasted resources.

Building on that, you can make sure that non-SEO pages like accounts, shopping carts, and login pages are not crawled and that only your SEO-rich pages are targeted.

Robots.txt SEO Tips and Best Practices

Right, we know what this file is used for, and we know how it can benefit your online store, but how do you actually go about achieving that? To get you started, I have listed some simple tips and best practices below.

Robots.txt Files Relate to User Agents

First, you must understand that Robots.txt files are meant to be read by different user agents. User agents are the crawling software for different search engines that access and index your web pages.

In your Robots.txt file, you can either create a blanket set of instructions for all user agents using the user-agent: * text string, or specify instructions for individual user agents such as Googlebot, Bingbot, and Baiduspider in which case the text string would start something like User-agent: Googlebot.

Directives Are Robots.txt File Instructions

Robots.txt files are composed of directives which are essentially text strings that act as a set of instructions for the crawler bots to follow. Available directives include:

User-agent: Specify which crawler the instructions apply to.
Disallow: Tells the crawler not to crawl a specific page.
Allow: Tells the crawler it can access specific pages.
Crawl-delay: Tells the crawler a specific time in seconds to wait before crawling a page.
Sitemap: Gives the crawler your eCommerce sitemap URL.

eCommerce Robots.txt Files Should Target Low-Priority Pages

The most important thing to remember when creating your eCommerce Robots.txt file is to target low-priority pages! These are pages within your online store that are necessary for its functionality but serve no SEO purpose. Examples include:

Checkout.
Shopping cart.
Customer account.
Account login page.
Account registration page.
Reset password page.

Just think – what benefit does it serve for Googlebot to crawl a customer’s account login page compared to an actual product page, product category page, or blog article? None! An account login page is purely functional and a necessary part of your online store so that customers can manage their accounts.

By disallowing access to these non-vital pages (in terms of SEO usefulness) you are improving the efficiency of the search engine crawlers and making sure they get to the pages and content that matters quicker.

Please note that as Google states, the Robots.txt file is not meant to be used to hide web pages from search results. If this is your desired end result, methods like password protection or the noindex HTML meta tag should be used instead.

Also Consider eCommerce URL Parameters

Part of creating a successful eCommerce store is tracking data and performing analytical research. A potential issue with this is that oftentimes URL parameters are added at the end of web page addresses to enable tracking and this can cause duplicate crawling.

The Robots.txt file can be used to prevent this from happening by disallowing particular URL parameters. Examples include:

Disallow: /*?ref=
Disallow: /*?page=

The ref parameter, for example, is often used in web page URLs to track referral data and this isn’t something that needs to be indexed.

Creating an Optimized Robots.txt File Allows for More Efficient Crawling of Your eCommerce Store

It’s best to create a Robots.txt file as soon as the basic structure and hierarchy of your eCommerce website are developed and many eCommerce website builders like Shopify and WooCommerce have the functionality to automate this process.

Don’t leave it at that though – as your online store grows it’s critical to refer back to the Robots.txt file and update it to reflect how your site and SEO strategies develop.

Article by:

Joshua George

Joshua George is the founder of ClickSlice, an SEO Agency based in London, UK.

He has eight years of experience as an SEO Consultant and was recently hired by the UK government for SEO training. Joshua also owns the best-selling SEO course on Udemy, and has taught SEO to over 100,000 students.

His work has been featured in Forbes, Entrepreneur, AgencyAnalytics, Wix and lots more other reputable publications.