robots txt file can containg your XML sitemal for Drupal


This article was sent to us by: Latoya K. at 01172010

1 Search Engine Optimization » robots txt file can containg your XML sitemal for Drupal
Bookmark and Share

Adding your XML Sitemap to the robots.txt file

Another way that that the robots.txt file helps you search engine optimize your Drupal site is by allowing you to specify where your sitemaps are located. While you probably want to submit your sitemap directly to Google, Yahoo!, and MSN, it's a good idea to put a reference to it in the robots.txt file for all of those other search engines. You can do this by carrying out the following steps:

1. Open the robots.txt file for editing.

2. The sitemap directive is independent of the User-agent line, so it doesn't matter where you place it in your robots.txt file.

To keep things neat, add this line first:

# Sitemaps

Add these lines for your XML sitemap:

Sitemap: http://www.yourDrupalsite.com/sitemap.xml
Sitemap: http://www.yourDrupalsite.com/?q=sitemap.xml

If you're using the URL list sitemap instead, add these lines:

Sitemap: http://www.yourDrupalsite.com/urllist.txt
Sitemap: http://www.yourDrupalsite.com/?q=urllist.txt 

3. Your finished

4. Save your robots.txt file, uploading it if necessary, replacing the existing file (you backed it up, didn't you?).

5. Go to http://www.yourDrupalsite.com/robots.txt and double-check that your changes are in effect. You may need to perform a refresh on your browser to see the changes.

If you have an XML sitemap, use it. If not, use the URL list sitemap. However, do not add both, an XML sitemap and a URL list sitemap, to the robots.txt file. It could confuse the search engines; possibly even causing duplicate content on your site. Also, do not add your visitor-facing sitemap to your robots.txt file.

Using Google's Webmaster Tools to evaluate your robots.txt file

Warning! The robots.txt file is easy to mess up! It's not written for humans so it's easy for site owners and webmasters to misunderstand exactly how to use it. Take care not to break your SEO campaign simply because a poorly written robots. txt file is excluding your site from Google. Fortunately, Google's Webmaster Tools provides a helpful utility that shows you exactly which pages are being excluded and included by your robots.txt file. Carry out the following steps to evaluate your robots.txt file using Google's Webmaster Tools:

1. Go to http://www.google.com/webmasters/, log in, and click on your site.

2. Click on the Tools menu item and you'll see a screen similar to the following screenshot:

Google Robots txt in webmaster tools

3. Click on Analyze robots.txt. You'll see some interesting statistics about your robots.txt file.

You'll see the following options on the next page:

  • URL: The location of your file
  • Last downloaded: The last snapshot that Google took of your robots.txt file. They tend to grab the latest file once per day.
  • Status: Anything other than 200 (Success) means that there's a problem with your robots.txt file.
  • Parsing results: Indicates any lines or rules that are ignored by Google. As you can see in this example, Google is ignoring the crawl-delay.

4. Further down the page, you'll see the text of the robot.txt file that Google last downloaded from your site. If you've tweaked it more recently than the last download, you can copy and paste your changes into the box provided so that you can test your changes. This is for testing purposes only. Any changes you make will not be saved.

5. The next box, labeled Test URLs against this robots.txt file, is a list of URLs from your web site. By making changes to the robots.txt box and adding URLs, you can see how different rules will affect the way Google sees your site.

6. Further down, Choose User-agents allows you to specify which Googlebot you want to evaluate. Google has several they use, like Googlebot-Mobile and Googlebot-Image.

Let's try an example. We're going to tell Googlebot-Image to leave our site alone! As you can see below, I added these lines to the robots.txt text box:

User-agent: Googlebot-Image
 Disallow: /*.jpg$
 Disallow: /*.gif$
 Disallow: /*.png$

Both Googlebot and Googlebot-Image were blocked by our robots.txt file. For more information about the robots.txt specification, please visit these sites:

Feel free to try different things on your own. You can't hurt your site here. If you like what you've done, be sure to copy and paste the changes into the robots.txt file on the root level of your Drupal site.

Legal Disclaimer

Webworldarticles.com is not responsible for the information contained by this article as well for any and all copyright infringements by authors and writers. Webworldarticles.com is a free information resource. If you suspect this article for any copyright infringement, please read the terms of service and contact us to investigate the problem.

Related Articles

1. Ecommerce web sites and critical lead indicators
Analytics to watch The mantra of a great web site team should be 'measure everything'. Understanding what and why you're tracking certain th...

2. Lead generation metrics that are critical for your website
Critical lead generation metrics Lead generation sites are focused on getting people to contact them. Examples include real estate agents, a...

3. Using analytics metrics to make SEO decisions
Secondary metrics worth tracking Analytics data is great at showing trends in your site's visitors. These trends may be useful for making ce...

4. Roles in the construction of digital advertising
Other roles that have emerged with the growth of digital advertising are more related to the production of online campaigns. Digital technical...

5. Website traffic analyzer
Web traffic is traffic which is getting generated on the internet or on your Local Area Network (or LAN) Several web hosting companies provide a...

6. Monitoring via the Cloud: Monitis versus Open Source Monitoring Software
If your company is using a complex IT infrastructure and you are responsible for delivering mission critical applications, and every time the system is ...

7. The First monitoring company to fully integrate systems
There is a problem out there in IT-Land that is seldom spoken of, but just might be an IT manager's greatest nemesis: wasted time.  ...

8. Inbound Link Building in Internet Marketing
A technique to generate inbound links to a website to increase web traffic and internet popularity.It is a most effective approach to build high P...

9. Advertise your website via web directories
In today world of online promoting one the most important place to start getting your website seen is in web directories. Web directories are online sit...