Is GoogleBot going crazy?

GoogleBot no longer seems to respect the instructions in the robots.txt file

For a few months now, many PrestaShop users have been impacted at the referencing level by incomprehensible penalties.

GoogleBot no longer seems to follow the instructions laid down in the robots.txt file.

I have one of my clients who was also affected by this erratic behavior of the last few months of GoogleBot.

I'm going to share here the correction steps that we've put in place in partnership with the SEO agency, La Mandrette.

Identification of the problem

For a few months, going to the Google Search Console, you can find tens of thousands of URLs ending in ?q=xxxxxxxxx.

This information sent in the URL of your site corresponds to the filters generated by the module. ps_facetedsearchnative in PrestaShop allowing you to create what's commonly called facet navigation. This navigation is very convenient for the customer, as this allows him to limit the display of the products corresponding to his search, such as a color or a size, in the context of a clothing shop.

In the construction of the module code, each filter will generate a link to a result page and since each filter can be combined with the others, we can imagine the number of links in the end that can invade the Google Search Console.

What's strange is that the robots.txt file includes information for indexing robots so as not to follow this type of link. We even have the information on each link with rel="nofollow".

So we ask, why, overnight, Google starts to refer to and analyze these pages.

Several hypotheses may justify this. The first is just that Google, to save its efforts from Crawl sites, decided to retrieve the browsing data of all users of its Chrome browser, as it already does for the analysis of "performances", FCP, CLS and tutti quanti. The second hypothesis would be that Google, as part of its evolution towards artificial intelligence, deployed or transformed its indexing robot to recover the entire web without worrying about the consequences for the sites and its historical search engine ranking.

In any case, what should be done to correct or at least reduce the impact of this situation on your online shops?

The simplest and fastest solution

What could be better than trusting the PrestaShop ecosystem to have members who offer alternative, high-performing and functional solutions to quickly correct this type of inconvenience.

There are two modules that refer to PrestaShop to offer high-performance facet navigation with perfect options to fit all types of commerce.

These modules will fix the existence of these unpleasant pages, but cleaning up the Google Search Console will take time. So you can incorporate the procedure below to clean the indexing.

Cleaning the Google Search Console

Here's a more or less technical procedure to try to clean up the Google Search Console of these useless URLs. I say try, because Google, as it shows us with this problem, is just on its head, so it can take several months to see an improvement.

3 techniques to be implemented at the same time:

  1. Modification of robots.txt to no longer prohibit indexing robots from visiting these pages, which seems strange since the indexing robots do not follow this prohibition properly. It's just to make sure you give URLs every chance of being visited.
  2. Addition of a Noindex section in the robots.txt file. Supposedly that Google doesn't take it into account, but again, we prefer to prevent and put all the odds on our side.
  3. And, the last, which will be the most complex to implement for non-technical. It's changing the code of your theme to properly display the meta robot data by noindexFor some pages, and I can reassure you, I will propose a simple solution through an authoritative module in the field. Thank you the PrestaShop ecosystem.

Modification of the robots.txt file

For those of you who don't know the robots.txt file yet, I invite you to visit this site explaining everything you need to know about this file: robots-txt.com

1- We will disable the referencing ban on the results pages with filter

Open the file robots.txtfrom your shop with your favourite editor and identify the following lines:

Disallow: /*?order=
Disallow: /*?q=
Disallow: /*&order=
Disallow: /*&q=

You have to turn them off, either by simply erasing them, while keeping track, because you will have to put them back on when the cleaning is complete, or by commenting on them like this:

# Disallow: /*?order=
# Disallow: /*?q=
# Disallow: /*&order=
# Disallow: /*&q=
2- We will prohibit the indexing of these pages

To do this, we're going to put two techniques in place, one in the robots.txt file and another in the header of your shop.

In the robots.txt file that should normally always be opened in your editor, you have to add these lines above # Allow Directives

This new directive is considered to be inactive, but the returns we have been able to glean seem to prove the contrary. In doubt, it is therefore preferable to put it in place. Topics explanation on robots-txt.com.

This will give the result:

# Noindex Directives
Noindex: /*?order=
Noindex: /*?q=
Noindex: /*&order=
Noindex: /*&q=
# Allow Directives
You can now save your modified robots.txt file to your hosting.

Now let's move on to the change of your shop's header.

Always with your favorite code editor, open the following file in your theme: /themes/votre_theme/templates/_partials/head.tpl

To line 39 you should find this:

{if $page.meta.robots !== 'index'}
  <meta name="robots" content="{$page.meta.robots}">
{/if}
We will modify this condition in order to force the prohibition of indexing of pages from the facet filters.
{if $page.meta.robots !== 'index'}
  <meta name="robots" content="{$page.meta.robots}">
{elseif isset($smarty.get.order) || isset($smarty.get.q) || http_response_code() == '403'}
  <meta name="robots" content="noindex, follow">
{/if}

This code makes it possible to add no index pages containing queries of type q=, order=and pages in erreur 403.

If you don't feel the soul of a coding professional to make these changes, you have the chance to enjoy a large community that is PrestaShop and to be able to simply install and configure the following module: Op'art NoIndex: Boost your SEO, avoid penalties

When you have put all these corrections in place, it will take several weeks/months to ensure that the cleaning of the Google Search Console is complete. It's up to you to follow that directly.

When cleaning is complete, put back in place the previously commented or erased rules of the rotos.txt file:

Disallow: /*?order=
Disallow: /*?q=
Disallow: /*&order=
Disallow: /*&q=

At the end of the day

Some are going to shout scandal, that PrestaShop is doing nothing to correct this kind of problem and patati and patata (you feel that this kind of argument is agky to me), but here, the problem really comes from Google which seems to be increasingly bad and pretending to refer to the complex sites properly, I wouldn't even be talking about the AMP page scam.

Let's see the positive and precisely, PrestaShop being open source, you can instantly find a solution proposed by its community for every brake you encounter in the development of your shop.

Others are still pestering, as they will have to purchase add-ons to replace a native module. Here, I would like to recall several points, the one that these add-ons do more than the native module and for a paltry price in relation to the real cost of developing such functionalities, or that these modules are an investment to allow you as a merchant to earn even more sales.

So let's not hesitate and quickly install these modules on our PrestaShop stores to finally trigger new sales and fix a Google bug.

Summary of tasks to be performed

  1. Remove crawl blocking in robots.txt
  2. Set pages to noIndex
  3. Wait for Google to visit these pages and de-index them
  4. Reset crawl blocking in robots.txt

Comments