If you run or manage a Magento store, then you need to be aware of Google’s latest advice on optimal indexation and their launch of rendering-based indexation that enables Google crawl and index pages like a typical modern browser, with CSS and JavaScript turned on. Directly quoting Google’s engineer, Pierre Far:
Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings.
My previous advice to Magento clients for the set up of their robots.txt file was as follows:
# 2x Media Robots.txt MMMM YYYY # # robots.txt # # This file helps guide robots on how to crawl and index certain parts # of this website # It will also save server bandwidth and resources. # # # For more information about the robots.txt standard, see: # http://www.robotstxt.org/wc/robots.html # # For syntax checking, see: # http://www.sxw.org.uk/computing/robots/check.html
# Website Sitemap Sitemap: www.magentoshop.com/sitemap.xml
# Crawlers Setup
# Directories User-agent: * Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /includes/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ Disallow: /pkginfo/ Disallow: /report/ Disallow: /skin/ Disallow: /stats/ Disallow: /var/
# Paths (clean URLs) User-agent: * Disallow: /index.php/ Disallow: /catalog/product_compare/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ Disallow: /catalogsearch/ Disallow: /checkout/ Disallow: /control/ Disallow: /contacts/ Disallow: /customer/ Disallow: /customize/
# Files User-agent: * Disallow: /cron.php Disallow: /cron.sh Disallow: /error_log Disallow: /install.php Disallow: /LICENSE.html Disallow: /LICENSE.txt Disallow: /LICENSE_AFL.txt Disallow: /STATUS.txt
Now that Google can clearly crawl and index JavaScript and CSS, blocking your /js/ and /skin/ folders will be detrimental to the optimal indexation of your store.
So you would need to remove these lines from your robots.txt:
Disallow: /js/ Disallow: /skin/
I have actually seen these robot.txt entries on some Magento stores:
Disallow: /*.js$ Disallow: /*.css$
Please remove them or else your Fetch and Render in Google Webmaster Tools will look something like this:
I came across this post on Yoast.com about a site, iPhoned.nl’s significant drop in traffic as a result of CSS and JS blocking. This is how bad their traffic drop was:
And this is how Google fetched and rendered their site on Google Webmaster Tools:
After the unblocking of the CSS and JavaScript in their robots.txt, their traffic recovered:
Even Google’s John Mueller advises that allowing crawling of JavaScript and CSS makes it a lot easier for Google to recognize your site’s content and to give your site the credit that it deserves for that content.
For example, if you’re pulling in content via AJAX/JSON feeds, that would be invisible to us if you disallowed crawling of your JavaScript. Similarly, if you’re using CSS to handle a responsive design that works fantastically on smartphones, we wouldn’t be able to recognize that if the CSS were disallowed from crawling. This is why we make the recommendation to allow crawling of URLs that significantly affect the layout or content of a page. If your JavaScript or CSS files significantly affect the content or layout of the page, we recommend allowing us to crawl them, so that we can use that additional information to show your site for queries that match content which isn’t directly in your HTML responses. While unrobotting that content would make things easier for our algorithms to pick up, it would be incorrect to say that not allowing crawling would automatically trigger our quality algorithms to view your site negatively.
I will recommend that your robots.txt should look like this:
# 2x Media Robots.txt MMMM YYYY # # robots.txt # # This file helps guide robots on how to crawl and index certain parts # of this website # It will also save server bandwidth and resources. # # # For more information about the robots.txt standard, see: # http://www.robotstxt.org/wc/robots.html # # For syntax checking, see: # http://www.sxw.org.uk/computing/robots/check.html
# Website Sitemap Sitemap: www.magentoshop.com/sitemap.xml
# Crawlers Setup
# Directories User-agent: * Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /includes/ Disallow: /lib/ Disallow: /magento/ Disallow: /pkginfo/ Disallow: /report/ Disallow: /stats/ Disallow: /var/
# Paths (clean URLs) User-agent: * Disallow: /index.php/ Disallow: /catalog/product_compare/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ Disallow: /catalogsearch/ Disallow: /checkout/ Disallow: /control/ Disallow: /contacts/ Disallow: /customer/ Disallow: /customize/
# Files User-agent: * Disallow: /cron.php Disallow: /cron.sh Disallow: /error_log Disallow: /install.php Disallow: /LICENSE.html Disallow: /LICENSE.txt Disallow: /LICENSE_AFL.txt Disallow: /STATUS.txt
My last and final point/tip is to ensure that you optimize the serving of your JavaScript and CSS files by minifying and merging your JavaScript and CSS files and by configuring your web server to serve them as gzip compressed files. Also make sure you have sufficient bandwidth capacity to serve Javascript and CSS files to Googlebot. Please don’t block JavaScript and CSS.
If you run or manage a Magento store, then you need to be aware of Google’s latest advice on optimal indexation and their launch of rendering-based indexation that enables Google crawl and index pages like a typical modern browser, with CSS and JavaScript turned on. Directly quoting Google’s engineer, Pierre Far:
Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings.
My previous advice to Magento clients for the set up of their robots.txt file was as follows:
# 2x Media Robots.txt MMMM YYYY # # robots.txt # # This file helps guide robots on how to crawl and index certain parts # of this website # It will also save server bandwidth and resources. # # # For more information about the robots.txt standard, see: # http://www.robotstxt.org/wc/robots.html # # For syntax checking, see: # http://www.sxw.org.uk/computing/robots/check.html
# Website Sitemap Sitemap: www.magentoshop.com/sitemap.xml
# Crawlers Setup
# Directories User-agent: * Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /includes/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ Disallow: /pkginfo/ Disallow: /report/ Disallow: /skin/ Disallow: /stats/ Disallow: /var/
# Paths (clean URLs) User-agent: * Disallow: /index.php/ Disallow: /catalog/product_compare/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ Disallow: /catalogsearch/ Disallow: /checkout/ Disallow: /control/ Disallow: /contacts/ Disallow: /customer/ Disallow: /customize/
# Files User-agent: * Disallow: /cron.php Disallow: /cron.sh Disallow: /error_log Disallow: /install.php Disallow: /LICENSE.html Disallow: /LICENSE.txt Disallow: /LICENSE_AFL.txt Disallow: /STATUS.txt
Now that Google can clearly crawl and index JavaScript and CSS, blocking your /js/ and /skin/ folders will be detrimental to the optimal indexation of your store.
So you would need to remove these lines from your robots.txt:
Disallow: /js/ Disallow: /skin/
I have actually seen these robot.txt entries on some Magento stores:
Disallow: /*.js$ Disallow: /*.css$
Please remove them or else your Fetch and Render in Google Webmaster Tools will look something like this:
I came across this post on Yoast.com about a site, iPhoned.nl’s significant drop in traffic as a result of CSS and JS blocking. This is how bad their traffic drop was:
And this is how Google fetched and rendered their site on Google Webmaster Tools:
After the unblocking of the CSS and JavaScript in their robots.txt, their traffic recovered:
Even Google’s John Mueller advises that allowing crawling of JavaScript and CSS makes it a lot easier for Google to recognize your site’s content and to give your site the credit that it deserves for that content.
For example, if you’re pulling in content via AJAX/JSON feeds, that would be invisible to us if you disallowed crawling of your JavaScript. Similarly, if you’re using CSS to handle a responsive design that works fantastically on smartphones, we wouldn’t be able to recognize that if the CSS were disallowed from crawling. This is why we make the recommendation to allow crawling of URLs that significantly affect the layout or content of a page. If your JavaScript or CSS files significantly affect the content or layout of the page, we recommend allowing us to crawl them, so that we can use that additional information to show your site for queries that match content which isn’t directly in your HTML responses. While unrobotting that content would make things easier for our algorithms to pick up, it would be incorrect to say that not allowing crawling would automatically trigger our quality algorithms to view your site negatively.
I will recommend that your robots.txt should look like this:
# 2x Media Robots.txt MMMM YYYY # # robots.txt # # This file helps guide robots on how to crawl and index certain parts # of this website # It will also save server bandwidth and resources. # # # For more information about the robots.txt standard, see: # http://www.robotstxt.org/wc/robots.html # # For syntax checking, see: # http://www.sxw.org.uk/computing/robots/check.html
# Website Sitemap Sitemap: www.magentoshop.com/sitemap.xml
# Crawlers Setup
# Directories User-agent: * Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /includes/ Disallow: /lib/ Disallow: /magento/ Disallow: /pkginfo/ Disallow: /report/ Disallow: /stats/ Disallow: /var/
# Paths (clean URLs) User-agent: * Disallow: /index.php/ Disallow: /catalog/product_compare/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ Disallow: /catalogsearch/ Disallow: /checkout/ Disallow: /control/ Disallow: /contacts/ Disallow: /customer/ Disallow: /customize/
# Files User-agent: * Disallow: /cron.php Disallow: /cron.sh Disallow: /error_log Disallow: /install.php Disallow: /LICENSE.html Disallow: /LICENSE.txt Disallow: /LICENSE_AFL.txt Disallow: /STATUS.txt
My last and final point/tip is to ensure that you optimize the serving of your JavaScript and CSS files by minifying and merging your JavaScript and CSS files and by configuring your web server to serve them as gzip compressed files. Also make sure you have sufficient bandwidth capacity to serve Javascript and CSS files to Googlebot. Please don’t block JavaScript and CSS.