Today I’m going to talk about robots.txt files and their impact on a Magento Website. Thanks to a current customer of ours who stumbled across a robots.txt thread in the magento forums we have the basis for a comprehensive robots.txt file.
First, a little history.
Imagine your website as a stately manor – a bit like that series “Downton Abbey” that was on TV recently. A big expansive and exquisitely designed piece of magnificence. You are lord and lady of this manor, you know where things are and where they need to be. However you are nervous, you pace the corridors with anxiety, you check the positions of chairs, sofas, candelabras. Anything out of place you put back, anything now put back that looks out of place you put away. You slide your finger along the surfaces checking for dust, you get on your hands and knees inspecting the carpet.
You know you should leave these jobs to your servants, but can you really trust them to do it right? Do they have as much at stake in this as you do? Of course not. They do not understand the significance of this visit. It all comes down to this, tonight will, perhaps, just perhaps – hold the key for your entire future. The money, the fame, the glory! It all rests on this.
For tonight you dine with Mr G.
You feel a little more confident in your presentation, the hairs are plucked and the spots are squeezed, however you know that Mr G is meticulous and misses nothing, you swallow hard. A bellboy runs forward and informs you a carriage is approaching, you feel the first tendrils of real panic setting in. Your mind races.
What have I forgotten? What have I missed? Nothing, the path is clear, he will enter here, he will walk there and he will dine there and leave the same way. Oh God but what if he doesn’t? What if he wants to have a look around?
You can’t hold him by the hand – you are a Lord – you need to be presented at the dinner table, not ushering Mr G down halls and stairs. Your eyes scan the entry room – servants are lining up as they have been told, everyone knows what they are doing and you are confident they will do it well. Who can you spare, who can act as guide and guardian of Mr G? Your gaze settles on Master R Botts. He is a plain young man and there’s nothing remarkable about him. However you know that whatever you ask of him he will do and he will do it well.
“Master Botts!” you call. “Mr G will be here presently, when he arrives you will be his guide. There are places in the house that have not had the greatest of care taken over them. I would prefer it if he were not to witness such things. You must keep him away from the following:”. You proceed to rattle off a list of rooms and hallways that are exempt to Mr G’s visit. “That’s about it Master Botts, this is a splendid house and I know he will appreciate it. But just keep him from those areas.”.
Master Botts simply nods his head. It will be done.
With a weight much like that of your solid oak doors suddenly taken from your shoulders, you hurry to the dining hall where you will await Mr G and his all important verdict…
Back to the Present
I felt that it was high time that a new description of “what a robots.txt file is for” was produced. Sorry for the lengthy passage I simply got carried away.
Anyway, a robots.txt file is very important and the moral of the above story is this (or thereabouts): “When Mr G (Google) appears he will want to see everything about your site, in almost all cases it is not wise to do so, therefore you should always have somebody – as it cannot be yourself – at hand to steer Mr G away from trouble. Robots.txt files (Master Botts) are the perfect weapons for this situation. Master Botts will be the first to greet Mr G, and will not leave his side until Mr G leaves trouble-free.”.
How can we apply this to Magento? Well what we need to do is make sure that all areas of a Magento website, that are not for the public, are disallowed in your robots.txt file.
A good example of this can be seen in the following – and you will notice that there are many areas that are included – as well as not including into this file the path to the admin login (a sure-fire giveaway for hackers and crackers alike).
# $Id: robots.txt,v magento-specific 2010/28/01 18:24:19 goba Exp $ # # robots.txt # # This file is to prevent the crawling and indexing of certain parts # of your site by web crawlers and spiders run by sites like Yahoo! # and Google. By telling these "robots" where not to go on your site, # you save bandwidth and server resources. # # This file will be ignored unless it is at the root of your host: # Used: http://example.com/robots.txt # Ignored: http://example.com/site/robots.txt # # For more information about the robots.txt standard, see: # http://www.robotstxt.org/wc/robots.html # # For syntax checking, see: # http://www.sxw.org.uk/computing/robots/check.html # Website Sitemap Sitemap: http://www.mydomain.com/sitemap.xml # Crawlers Setup User-agent: * Crawl-delay: 10 # Allowable Index Allow: /*?p= Allow: /index.php/blog/ Allow: /catalog/seo_sitemap/category/ Allow:/catalogsearch/result/ # Directories Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /includes/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ # Disallow: /media/ // I would personally allow this folder for google product caching Disallow: /pkginfo/ Disallow: /report/ Disallow: /skin/ Disallow: /stats/ Disallow: /var/ # Paths (clean URLs) Disallow: /index.php/ Disallow: /catalog/product_compare/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ Disallow: /catalogsearch/ Disallow: /checkout/ Disallow: /control/ Disallow: /contacts/ Disallow: /customer/ Disallow: /customize/ Disallow: /newsletter/ Disallow: /poll/ Disallow: /review/ Disallow: /sendfriend/ Disallow: /tag/ Disallow: /wishlist/ # Files Disallow: /cron.php Disallow: /cron.sh Disallow: /error_log Disallow: /install.php Disallow: /LICENSE.html Disallow: /LICENSE.txt Disallow: /LICENSE_AFL.txt Disallow: /STATUS.txt # Paths (no clean URLs) Disallow: /*.js$ Disallow: /*.css$ Disallow: /*.php$ Disallow: /*?p=*& Disallow: /*?SID=
All credit for the above lies with the Magento Forum at this thread.
My role in this was to perhaps remind you of the importance of a robots.txt file and what it can actually do. Oh and one more thing – make sure you link your sitemap.xml file into your robots.txt. I should have added something into the story such as “showing Mr G your fabulous family portrait wall” or something similar but hey-ho there you go.