How to fix the Drupal robots txt file


This article was sent to us by: David S. at 01172010

1 Search Engine Optimization » How to fix the Drupal robots txt file
Bookmark and Share

Fixing the Drupal robots.txt file

Carry out the following steps in order to fix the Drupal robots.txt file:

1. Make a backup of the robots.txt file.

2. Open the robots.txt file for editing. If necessary, download the file and open it in a local text editor.

3. Find the Paths (clean URLs) section and the Paths (no clean URLs) section. Note that both sections appear whether you've turned on clean URLs or not.

Drupal covers you either way. They look like this:

# Paths (clean URLs)
 Disallow: /admin/
 Disallow: /comment/reply/
 Disallow: /contact/
 Disallow: /logout/
 Disallow: /node/add/
 Disallow: /search/
 Disallow: /user/register/
 Disallow: /user/password/
 Disallow: /user/login/
 # Paths (no clean URLs)
 Disallow: /?q=admin/
 Disallow: /?q=comment/reply/
 Disallow: /?q=contact/
 Disallow: /?q=logout/
 Disallow: /?q=node/add/
 Disallow: /?q=search/
 Disallow: /?q=user/password/
 Disallow: /?q=user/register/
 Disallow: /?q=user/login/

4. Duplicate the two sections (simply copy and paste them) so that you have four sections—two of the # Paths (clean URLs) sections and two of # Paths (no clean URLs) sections.

5. Add 'fixed!' to the comment of the new sections so that you can tell them apart.

6. Delete the trailing / after each Disallow line in the fixed! sections. You should end up with four sections that look like this:

# Paths (clean URLs)
 Disallow: /admin/
 Disallow: /comment/reply/
 Disallow: /contact/
 Disallow: /logout/
 Disallow: /node/add/
 Disallow: /search/
 Disallow: /user/register/
 Disallow: /user/password/
 Disallow: /user/login/
 # Paths (no clean URLs)
 Disallow: /?q=admin/
 Disallow: /?q=comment/reply/
 Disallow: /?q=contact/
 Disallow: /?q=logout/
 Disallow: /?q=node/add/
 Disallow: /?q=search/
 Disallow: /?q=user/password/
 Disallow: /?q=user/register/
Disallow: /?q=user/login/
 # Paths (clean URLs) – fixed!
 Disallow: /admin
 Disallow: /comment/reply
 Disallow: /contact
 Disallow: /logout
 Disallow: /node/add
 Disallow: /search
 Disallow: /user/register
 Disallow: /user/password
 Disallow: /user/login
 # Paths (no clean URLs) – fixed!
 Disallow: /?q=admin
 Disallow: /?q=comment/reply
 Disallow: /?q=contact
 Disallow: /?q=logout
 Disallow: /?q=node/add
 Disallow: /?q=search
 Disallow: /?q=user/password
 Disallow: /?q=user/register
 Disallow: /?q=user/login

7. Save your robots.txt file, uploading it if necessary, replacing the existing file (you backed it up, didn't you?).

8. Go to http://www.yourDrupalsite.com/robots.txt and double-check that your changes are in effect. You may need to do a refresh on your browser to see the changes.

Now your robots.txt file is working as you would expect it to.

Additional changes to the robots.txt file

Using directives and pattern matching commands, the robots.txt file can exclude entire sections of the site from the crawlers like the admin pages, certain individual files like cron.php, and some directories like /scripts and /modules. In many cases, though, you should tweak your robots.txt file for optimal SEO results. Here are several changes you can make to the file to meet your needs in certain situations:

You are developing a new site and you don't want it to show up in any search engine until you're ready to launch it. Add Disallow: * just after the User-agent:

Say you're running a very slow server and you don't want the crawlers to slow your site down for other users. Adjust the Crawl-delay by changing it from 10 to 20.

If you're on a super-fast server (and you should be, right?) you can tell the bots to bring it on! Change the Crawl-delay to 5 or even 1 second. Monitor your server closely for a few days to make sure it can handle the extra load.

Say you're running a site which allows people to upload their own images but you don't necessarily want those images to show up in Google. Add these lines at the bottom of your robots.txt file:

User-agent: Googlebot-Image
Disallow: /*.jpg$
Disallow: /*.gif$
Disallow: /*.png$

If all of the files were in the /files/users/images/ directory, you could do this:

User-agent: Googlebot-Image
Disallow: /files/users/images/

• Say you noticed in your server logs that there was a bad robot out there that was scraping all your content. You can try to prevent this by adding this to the bottom of your robots.txt file:

User-agent: Bad-Robot
Disallow: *

Bad robots, renegade spiders, killer crawlers—it sounds like the plot of a 1950s sci-fi but they're real and they can hurt your site. Mostly, they just pull server resources and bandwidth away from your server. However, they could be doing other things like stealing your content or even spamming your users. The robots.txt file is your way of saying, 'No, robot! That's a bad robot! No scraps for you!'. It may help but you may need to get serious and have your server administrator deny service to the bots based on identifying string or IP address. Just be careful not to block all of the bots as your site will stop showing up in Google.

If you have installed the XML Sitemap module, then you've got a great tool that you should send out to all of the search engines. However, it's tedious to go to each engine's site and upload your URL. Instead, you can add a couple of simple lines to the robots.txt file.

robots.txt is a request, not a command

Do not expect that just because you put it in the robots.txt file that it will be strictly obeyed. Rogue spiders and bots often ignore your requests. This is highly unlikely from the major search engines, but it can, and does, happen. With this in mind, if you really want to obscure sensitive documents from the rest of the world, put it behind a password-protected section of your site.

Related article: How to include XML sitemap in your robots.txt file

Legal Disclaimer

Webworldarticles.com is not responsible for the information contained by this article as well for any and all copyright infringements by authors and writers. Webworldarticles.com is a free information resource. If you suspect this article for any copyright infringement, please read the terms of service and contact us to investigate the problem.

Related Articles

1. Robots txt file for Drupal examples and editing
Optimizing the robots.txt file The robots.txt file is a file that sits at the root level of your web site and asks spiders and bots to behav...

2. Lead generation paths to conversion long way
Conversions mean different things to different web sites. visitors to do. Are there other people in your organization who have a stake in th...

3. Ecommerce web sites and critical lead indicators
Analytics to watch The mantra of a great web site team should be 'measure everything'. Understanding what and why you're tracking certain th...

4. Lead generation metrics that are critical for your website
Critical lead generation metrics Lead generation sites are focused on getting people to contact them. Examples include real estate agents, a...

5. Using analytics metrics to make SEO decisions
Secondary metrics worth tracking Analytics data is great at showing trends in your site's visitors. These trends may be useful for making ce...

6. Roles in the construction of digital advertising
Other roles that have emerged with the growth of digital advertising are more related to the production of online campaigns. Digital technical...

7. Website traffic analyzer
Web traffic is traffic which is getting generated on the internet or on your Local Area Network (or LAN) Several web hosting companies provide a...