Fixing the Drupal robots.txt fileCarry out the following steps in order to fix the Drupal robots.txt file: 1. Make a backup of the robots.txt file. 2. Open the robots.txt file for editing. If necessary, download the file and open it in a local text editor. 3. Find the Paths (clean URLs) section and the Paths (no clean URLs) section. Note that both sections appear whether you've turned on clean URLs or not. Drupal covers you either way. They look like this: # Paths (clean URLs) Disallow: /admin/ Disallow: /comment/reply/ Disallow: /contact/ Disallow: /logout/ Disallow: /node/add/ Disallow: /search/ Disallow: /user/register/ Disallow: /user/password/ Disallow: /user/login/ # Paths (no clean URLs) Disallow: /?q=admin/ Disallow: /?q=comment/reply/ Disallow: /?q=contact/ Disallow: /?q=logout/ Disallow: /?q=node/add/ Disallow: /?q=search/ Disallow: /?q=user/password/ Disallow: /?q=user/register/ Disallow: /?q=user/login/ 4. Duplicate the two sections (simply copy and paste them) so that you have four sections—two of the # Paths (clean URLs) sections and two of # Paths (no clean URLs) sections. 5. Add 'fixed!' to the comment of the new sections so that you can tell them apart. 6. Delete the trailing / after each Disallow line in the fixed! sections. You should end up with four sections that look like this: # Paths (clean URLs) Disallow: /admin/ Disallow: /comment/reply/ Disallow: /contact/ Disallow: /logout/ Disallow: /node/add/ Disallow: /search/ Disallow: /user/register/ Disallow: /user/password/ Disallow: /user/login/ # Paths (no clean URLs) Disallow: /?q=admin/ Disallow: /?q=comment/reply/ Disallow: /?q=contact/ Disallow: /?q=logout/ Disallow: /?q=node/add/ Disallow: /?q=search/ Disallow: /?q=user/password/ Disallow: /?q=user/register/ Disallow: /?q=user/login/ # Paths (clean URLs) – fixed! Disallow: /admin Disallow: /comment/reply Disallow: /contact Disallow: /logout Disallow: /node/add Disallow: /search Disallow: /user/register Disallow: /user/password Disallow: /user/login # Paths (no clean URLs) – fixed! Disallow: /?q=admin Disallow: /?q=comment/reply Disallow: /?q=contact Disallow: /?q=logout Disallow: /?q=node/add Disallow: /?q=search Disallow: /?q=user/password Disallow: /?q=user/register Disallow: /?q=user/login 7. Save your robots.txt file, uploading it if necessary, replacing the existing file (you backed it up, didn't you?). 8. Go to http://www.yourDrupalsite.com/robots.txt and double-check that your changes are in effect. You may need to do a refresh on your browser to see the changes. Now your robots.txt file is working as you would expect it to. Additional changes to the robots.txt fileUsing directives and pattern matching commands, the robots.txt file can exclude entire sections of the site from the crawlers like the admin pages, certain individual files like cron.php, and some directories like /scripts and /modules. In many cases, though, you should tweak your robots.txt file for optimal SEO results. Here are several changes you can make to the file to meet your needs in certain situations: You are developing a new site and you don't want it to show up in any search engine until you're ready to launch it. Add Disallow: * just after the User-agent: Say you're running a very slow server and you don't want the crawlers to slow your site down for other users. Adjust the Crawl-delay by changing it from 10 to 20. If you're on a super-fast server (and you should be, right?) you can tell the bots to bring it on! Change the Crawl-delay to 5 or even 1 second. Monitor your server closely for a few days to make sure it can handle the extra load. Say you're running a site which allows people to upload their own images but you don't necessarily want those images to show up in Google. Add these lines at the bottom of your robots.txt file: User-agent: Googlebot-Image Disallow: /*.jpg$ Disallow: /*.gif$ Disallow: /*.png$ If all of the files were in the /files/users/images/ directory, you could do this: User-agent: Googlebot-Image Disallow: /files/users/images/ • Say you noticed in your server logs that there was a bad robot out there that was scraping all your content. You can try to prevent this by adding this to the bottom of your robots.txt file: User-agent: Bad-Robot Disallow: * Bad robots, renegade spiders, killer crawlers—it sounds like the plot of a 1950s sci-fi but they're real and they can hurt your site. Mostly, they just pull server resources and bandwidth away from your server. However, they could be doing other things like stealing your content or even spamming your users. The robots.txt file is your way of saying, 'No, robot! That's a bad robot! No scraps for you!'. It may help but you may need to get serious and have your server administrator deny service to the bots based on identifying string or IP address. Just be careful not to block all of the bots as your site will stop showing up in Google. If you have installed the XML Sitemap module, then you've got a great tool that you should send out to all of the search engines. However, it's tedious to go to each engine's site and upload your URL. Instead, you can add a couple of simple lines to the robots.txt file. robots.txt is a request, not a commandDo not expect that just because you put it in the robots.txt file that it will be strictly obeyed. Rogue spiders and bots often ignore your requests. This is highly unlikely from the major search engines, but it can, and does, happen. With this in mind, if you really want to obscure sensitive documents from the rest of the world, put it behind a password-protected section of your site. Related article: How to include XML sitemap in your robots.txt file Legal DisclaimerWebworldarticles.com is not responsible for the information contained by this article as well for any and all copyright infringements by authors and writers. Webworldarticles.com is a free information resource. If you suspect this article for any copyright infringement, please read the terms of service and contact us to investigate the problem. Related Articles1. Robots txt file for Drupal examples and editing Optimizing the robots.txt file The robots.txt file is a file that sits at the root level of your web site and asks spiders and bots to behav... 2. Lead generation paths to conversion long way Conversions mean different things to different web sites. visitors to do. Are there other people in your organization who have a stake in th... 3. Ecommerce web sites and critical lead indicators Analytics to watch The mantra of a great web site team should be 'measure everything'. Understanding what and why you're tracking certain th... 4. Lead generation metrics that are critical for your website Critical lead generation metrics Lead generation sites are focused on getting people to contact them. Examples include real estate agents, a... 5. Using analytics metrics to make SEO decisions Secondary metrics worth tracking Analytics data is great at showing trends in your site's visitors. These trends may be useful for making ce... 6. Roles in the construction of digital advertising Other roles that have emerged with the growth of digital advertising are more related to the production of online campaigns. Digital technical... 7. Website traffic analyzer Web traffic is traffic which is getting generated on the internet or on your Local Area Network (or LAN) Several web hosting companies provide a... All articles in this directory are property of their respective authors. Additionally, read our Privacy Policy © 2010 WebWorldarticles.com - All Rights Reserved. Online: 37 users browsing the articles directory
|