eCommerce Marketing Growth Hacks 

2X eCommerce Podcast

Kunle interviews Founders of Fast Growing 7-8 Figure Online Retail Business & E-commerce Marketing Experts

View podcasts

Download your free ebook

More

The eCommerce Marketing Blueprint

Magento Robots.txt SEO Friendly Set Up Guide for CE and Enterprise Edition

Posted on 29th October 2014 , by Kunle Campbell in Technical SEO

If you run or manage a Magento store, then you need to be aware of Google’s latest advice on optimal indexation and their launch of rendering-based indexation that enables Google crawl and index pages like a typical modern browser, with CSS and JavaScript turned on. Directly quoting Google’s engineer, Pierre Far:

Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings.

My previous advice to Magento clients for the set up of their robots.txt file was as follows:

# 2x Media Robots.txt MMMM YYYY
#
# robots.txt
#
# This file helps guide robots on how to crawl and index certain parts
# of this website
# It will also save server bandwidth and resources.
#
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html
# Website Sitemap
Sitemap: www.magentoshop.com/sitemap.xml
# Crawlers Setup
# Directories
User-agent: *
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/
# Paths (clean URLs)
User-agent: *
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
# Files
User-agent: *
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

Now that Google can clearly crawl and index JavaScript and CSS, blocking your /js/ and /skin/ folders will be detrimental to the optimal indexation of your store.

So you would need to remove these lines from your robots.txt:

Disallow: /js/
Disallow: /skin/

I have actually seen these robot.txt entries on some Magento stores:

Disallow: /*.js$
Disallow: /*.css$

Please remove them or else your Fetch and Render in Google Webmaster Tools will look something like this:

Webmaster_Tools_-_Fetch_as_Google_-_http___2xmedia_co_

Drop in Traffic as a Result of CSS and JavaScript in Robots.txt

I came across this post on Yoast.com about a site, iPhoned.nl’s significant drop in traffic as a result of CSS and JS blocking. This is how bad their traffic drop was:

And this is how Google fetched and rendered their site on Google Webmaster Tools:

After the unblocking of the CSS and JavaScript in their robots.txt, their traffic recovered:

Even Google’s John Mueller advises that allowing crawling of JavaScript and CSS makes it a lot easier for Google to recognize your site’s content and to give your site the credit that it deserves for that content.

For example, if you’re pulling in content via AJAX/JSON feeds, that would be invisible to us if you disallowed crawling of your JavaScript. Similarly, if you’re using CSS to handle a responsive design that works fantastically on smartphones, we wouldn’t be able to recognize that if the CSS were disallowed from crawling. This is why we make the recommendation to allow crawling of URLs that significantly affect the layout or content of a page. If your JavaScript or CSS files significantly affect the content or layout of the page, we recommend allowing us to crawl them, so that we can use that additional information to show your site for queries that match content which isn’t directly in your HTML responses. While unrobotting that content would make things easier for our algorithms to pick up, it would be incorrect to say that not allowing crawling would automatically trigger our quality algorithms to view your site negatively.

Google_is_Blocking_their_js_in_Robots_txt_But_Tell_Us_Not_To__-_Google_Product_Forums

My Recommended Set Up of Robots.txt for Magento

I will recommend that your robots.txt should look like this:

# 2x Media Robots.txt MMMM YYYY
#
# robots.txt
#
# This file helps guide robots on how to crawl and index certain parts
# of this website
# It will also save server bandwidth and resources.
#
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html
# Website Sitemap
Sitemap: www.magentoshop.com/sitemap.xml
# Crawlers Setup
# Directories
User-agent: *
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /includes/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /var/
# Paths (clean URLs)
User-agent: *
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
# Files
User-agent: *
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

My last and final point/tip is to ensure that you optimize the serving of your JavaScript and CSS files by minifying and merging your JavaScript and CSS files and by configuring your web server to serve them as gzip compressed files. Also make sure you have sufficient bandwidth capacity to serve Javascript and CSS files to Googlebot. Please don’t block JavaScript and CSS.

About the author:

Kunle Campbell

An ecommerce advisor to ambitious, agile online retailers and funded ecommerce startups seeking exponentially sales growth through scalable customer acquisition, retention, conversion optimisation, product/market fit optimisation and customer referrals.

Did You Enjoy Reading this Article?

Get Free Email Updates by Signing Up Below:

Podcasts you might like

Magento Robots.txt SEO Friendly Set Up Guide for CE and Enterprise Edition

Posted on 29th October 2014 , by Kunle Campbell in Technical SEO

If you run or manage a Magento store, then you need to be aware of Google’s latest advice on optimal indexation and their launch of rendering-based indexation that enables Google crawl and index pages like a typical modern browser, with CSS and JavaScript turned on. Directly quoting Google’s engineer, Pierre Far:

Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings.

My previous advice to Magento clients for the set up of their robots.txt file was as follows:

# 2x Media Robots.txt MMMM YYYY
#
# robots.txt
#
# This file helps guide robots on how to crawl and index certain parts
# of this website
# It will also save server bandwidth and resources.
#
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html
# Website Sitemap
Sitemap: www.magentoshop.com/sitemap.xml
# Crawlers Setup
# Directories
User-agent: *
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/
# Paths (clean URLs)
User-agent: *
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
# Files
User-agent: *
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

Now that Google can clearly crawl and index JavaScript and CSS, blocking your /js/ and /skin/ folders will be detrimental to the optimal indexation of your store.

So you would need to remove these lines from your robots.txt:

Disallow: /js/
Disallow: /skin/

I have actually seen these robot.txt entries on some Magento stores:

Disallow: /*.js$
Disallow: /*.css$

Please remove them or else your Fetch and Render in Google Webmaster Tools will look something like this:

Webmaster_Tools_-_Fetch_as_Google_-_http___2xmedia_co_

Drop in Traffic as a Result of CSS and JavaScript in Robots.txt

I came across this post on Yoast.com about a site, iPhoned.nl’s significant drop in traffic as a result of CSS and JS blocking. This is how bad their traffic drop was:

And this is how Google fetched and rendered their site on Google Webmaster Tools:

After the unblocking of the CSS and JavaScript in their robots.txt, their traffic recovered:

Even Google’s John Mueller advises that allowing crawling of JavaScript and CSS makes it a lot easier for Google to recognize your site’s content and to give your site the credit that it deserves for that content.

For example, if you’re pulling in content via AJAX/JSON feeds, that would be invisible to us if you disallowed crawling of your JavaScript. Similarly, if you’re using CSS to handle a responsive design that works fantastically on smartphones, we wouldn’t be able to recognize that if the CSS were disallowed from crawling. This is why we make the recommendation to allow crawling of URLs that significantly affect the layout or content of a page. If your JavaScript or CSS files significantly affect the content or layout of the page, we recommend allowing us to crawl them, so that we can use that additional information to show your site for queries that match content which isn’t directly in your HTML responses. While unrobotting that content would make things easier for our algorithms to pick up, it would be incorrect to say that not allowing crawling would automatically trigger our quality algorithms to view your site negatively.

Google_is_Blocking_their_js_in_Robots_txt_But_Tell_Us_Not_To__-_Google_Product_Forums

My Recommended Set Up of Robots.txt for Magento

I will recommend that your robots.txt should look like this:

# 2x Media Robots.txt MMMM YYYY
#
# robots.txt
#
# This file helps guide robots on how to crawl and index certain parts
# of this website
# It will also save server bandwidth and resources.
#
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html
# Website Sitemap
Sitemap: www.magentoshop.com/sitemap.xml
# Crawlers Setup
# Directories
User-agent: *
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /includes/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /stats/
Disallow: /var/
# Paths (clean URLs)
User-agent: *
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
# Files
User-agent: *
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

My last and final point/tip is to ensure that you optimize the serving of your JavaScript and CSS files by minifying and merging your JavaScript and CSS files and by configuring your web server to serve them as gzip compressed files. Also make sure you have sufficient bandwidth capacity to serve Javascript and CSS files to Googlebot. Please don’t block JavaScript and CSS.

About the author:

Kunle Campbell

An ecommerce advisor to ambitious, agile online retailers and funded ecommerce startups seeking exponentially sales growth through scalable customer acquisition, retention, conversion optimisation, product/market fit optimisation and customer referrals.

Did You Enjoy Reading this Article?

Get Free Email Updates by Signing Up Below:

eCommerce Marketing Growth Hacks 

2X eCommerce Podcast

Kunle interviews Founders of Fast Growing 7-8 Figure Online Retail Business & E-commerce Marketing Experts

View podcasts

Download your free ebook

More

The eCommerce Marketing Blueprint