Google Shopping Feeds, Magento & Robots.txt

Google Shopping Feeds, Magento & Robots.txt

Ashlee Muscroft

Google shopping feeds are getting disallowed left, right and centre due to this error: Product pages cannot be crawled because of robots.txt restriction.

When I’ve gone to fix this, Google’s recommendation has been to add a bit of code to the end of the robot.txt file – the purpose of which is to basically stop all code above it having an effect. Obviously, I’ve amended the robot.txt for a reason and trying to fix it with Google’s hack ruins our carefully crafted code and stops it working.

I spent some time going over the issue to see what we could discover was causing the problem and discovered that there were issues with the Magento product URLs on our feed. They contained the session ID as below:

http://www.mymagento.com/my-product.html?SID=adsfasbdgdgsvsg1ydf338asdfasdf

While the robot.txt file contained:

Disallow: /*?SID=

This meant that when I was submitting to Google, the product URLs contained info we were telling Google not to read. Silly.

I didn’t want to fiddle with the robot.txt file as I didn’t want search engine access to the URLs with session parameters contained within, so instead I changed the custom feed implementation so that the SID string could be removed.

I took:

$product->getProductUrl();

and replaced it with:

$product->getProductUrl(false);

This returns the code sans the session ID and allows the code to be accepted by Google once more.

Most issues with this error are caused by the SID line incompatibility. It is implemented to stop duplicated cached versions of product pages but can lead to the issue above. However, a quick check and a quick fix mean that it’s easy enough to locate and fix if this is happening to you.

If you are having issues with your Magento store, we’re happy to help here at Elementary Digital.

Ashlee Muscroft

Author Ashlee Muscroft

More posts by Ashlee Muscroft