XML Sitemap for Google setup for Drupal


This article was sent to us by: Marcia G. at 01172010

1 Search Engine Optimization » XML Sitemap for Google setup for Drupal
Bookmark and Share

XML sitemaps

In the early 2000s, Google started supporting XML sitemaps. Soon after Yahoo came out with their own standard and other search engines started to follow suit. Fortunately, in 2006, Google, Yahoo, Microsoft, and a handful of smaller players all got together and decided to support the same sitemap specification. That made it much easier for site owners to make sure every page of their web site is crawled and added to the search engine index. They published their specification at http://sitemaps.org.

Shortly thereafter, the Drupal community stepped up and created a module called (surprise!) the XML sitemap module. This module automatically generates an XML sitemap containing every node and taxonomy on your Drupal site. Actually, it was written by Matthew Loar as part of the Google Summer of Code. The Drupal 6 version of the module was developed by Kiam LaLuno. Finally, in mid-2009, Dave Reid began working on a version 2.0 of the module to address performance, scalability, and reliability issues. Thanks, guys!

According to www.sitemaps.org:

Sitemaps are an easy way for Webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata.

Using a sitemap does not guarantee that every page will be included in the search engines. Rather, it helps the search engine crawlers find more of your pages. In my experience, submitting an XML Sitemap to Google will greatly increase the number of pages when you do a site: search.

The keyword site: searches show you how many pages of your site are included in the search engine index, as shown in the following screenshot:

XML sitemap

Setting up the XML Sitemap module The XML Sitemap module creates a sitemap that conforms to the sitemap.org specification.

Which XML Sitemap module should you use?

There are two versions of the XML Sitemap module for Drupal 6. The 1.x version is, as of this writing, considered the stable release and should be used for production sites. However, if you have a site with more than about 2000 nodes, you should probably consider using the 2.x version. From www.drupal.org:

" The 6.x-2.x branch is a complete refactoring with considerations for performance, scalability, and reliability. Once the 6.x-2.x branch is tested and upgradeable, the 6.x-1.x branch will no longer be supported".

What this means is that in the next few months (quite possibly by the time you're reading this) everyone should be using the 2.x version of this module. That's the beauty of open source software there are always improvements coming that make your Drupal site better Search Engine Optimized.

The rest of this article refers to XML Sitemap module version 2.x Beta.

Carry out the following steps to set up the XML Sitemap module:

1. Download the XML Sitemap module from the following link, http://drupal.org/project/xmlsitemap and install it just like a normal Drupal module. When you go to turn on the module, you'll be presented with a list that looks similar to the following screenshot:

Sitemap Module

Before you turn on any included modules, consider what pieces of content on your site you want to show up in the search engines and only turn on the modules you need.

  • The XML sitemap module is required. Turn it on.
  • XML sitemap custom allows you to add your own customized links to the sitemap. Turn it on.
  • XML sitemap engines will automatically submit your sitemap to the search engines each time it changes. This is not necessary and there are better ways to submit your sitemap (like the robots.txt file which we'll cover in the next article). However, it does a nice job of helping you verify your site with each search engine. Turn it on.
  • XML sitemap menu adds your menu items to the sitemap. This is probably a good idea. Turn it on.
  • XML sitemap node adds all your nodes. That's usually the bulk of your content so this is a must-have. Turn it on.
  • XML sitemap taxonomy adds all your taxonomy term pages to the sitemap. Generally a good idea but some might not want this listed. Term pages are good category pages so I recommend it. Turn it on.
  • Don't forget to click Save configuration.

2. Go to http://www.yourDrupalsite.com/admin/settings/xmlsitemap or go to your admin screen and click on Administer | Site Configuration | XML sitemap link. You'll be able to see the XML sitemap, as shown in the following screenshot:

3. Click on Settings and you'll see a few options, as shown in the following screenshot:

Click on Settings and you'll see a few options.

  • Minimum sitemap lifetime: It determines that minimum amount of time that the module will wait before renewing the sitemap. Use this feature if you have an enormous sitemap that is taking too many server resources. Most sites should leave this set on No minimum.
  • Include a stylesheet in the: The sitemaps will generate a simple css file to include with the sitemap that is generated. It's not necessary for the search engines but very helpful for troubleshooting or if any humans are going to view the sitemap. Leave it checked.
  • Generate sitemaps for the following languages: In the future, this option will allow you to actually specify sitemaps for different languages. This is very important for international sites who want to show up in localized search engines. For now, English is the only option and should remain checked.

4. Click the Advanced settings drop-down and you'll see several additional options.

Sitemap generation

  • Number of links in each sitemap page allows you to specify how many links to pages on your web site will be in each sitemap. Leave it on Automatic unless you are having trouble with the search engines accepting the sitemap. From www.sitemaps.org: "You can provide multiple Sitemap files, but each Sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 10MB (10,485,760 bytes). If you want to list more than 50,000 URLs, you must create multiple Sitemap files. If you do provide multiple Sitemaps, you should then list each Sitemap file in a Sitemap index file."
  • Maximum number of sitemap links to process at once sets the number of additional links that the module will add to your sitemap each time the cron runs. This highlights one of the biggest differences between the new XML sitemap and the old one. The new sitemap only processes new nodes and updates the existing sitemap instead of reprocessing every time the sitemap is accessed. Leave this setting alone unless you notice that cron is timing out.
  • Sitemap cache directory allows you to set where the sitemap data will be stored. This is data that is not shown to the search engines or users; it's only used by the module.
  • Base URL is the base URL of your site and generally should be left as it is.

5. Click on the Front page drop-down and set these options:

  • Front page priority: 1.0 is the highest setting you can give a page in the XML sitemap. On most web sites, the front page is the single most important part of your site so, this setting should probably be left at 1.0.
  • Front page change frequency: Tells the search engines how often they should revisit your front page. Adjust this setting to reflect how often the front page of your site changes.

What is priority and how does it work?

Priority is an often-misunderstood part of a sitemap. For instance, the priority is only used to compare pages of your own site and you cannot increase your ranking in the Search Engine Results Page (SERPS) by increasing the priority of your pages. However, it does help let the search engines know which pages of your site you feel are more important. They could use this information to select between two different pages on your site when deciding which page to show to a search engine user.

6. Open the Content types drop-down and you will see the following screenshot:

Content Type

  • Here, you will see each Content type listed separately. You probably want to leave these settings alone so that all your content shows up in the sitemap.
  • If you do want to adjust the Content types settings in the sitemap, you'll need to go to the content type screen. Click on the name of the content type to go to that screen.
  • On the content type screen, open the XML sitemap drop-down and you'll see two options.

  • Include in sitemap sets the default action for that content type – if you check this box then it will be included in the sitemap.
  • Default priority allows you to set the default for each node that you create of that content type. Default is usually .5 but you can adjust it if you want certain pages of a higher or lower priority.
  • Click on Save content type.
  • Repeat for each content type that you wish to change.

7. Click Save configuration.

8. Now, you need to run cron. Cron is a recurring script that takes care of many maintenance issues in Drupal including populating the XML sitemap. To run cron, point your browser to http://www.yourDrupalsite.com/cron.php and wait until the page stops loading. You will not receive any indication that it's complete except that your browser will stop loading the page.

9. Point your browser to http://www.yourDrupalsite.com/sitemap.xml. If you see a bunch of gobbledygook that looks like the following screenshot:

Sitemap Preview

10. If yes, then you've done it right!

The XML Sitemap will only update when cron runs. On a normal Drupal installation, you should have set cron to run periodically nightly for most sites or more often for high-traffic sites.

Legal Disclaimer

Webworldarticles.com is not responsible for the information contained by this article as well for any and all copyright infringements by authors and writers. Webworldarticles.com is a free information resource. If you suspect this article for any copyright infringement, please read the terms of service and contact us to investigate the problem.

Related Articles

1. Robots txt file for Drupal examples and editing
Optimizing the robots.txt file The robots.txt file is a file that sits at the root level of your web site and asks spiders and bots to behav...

2. Lead generation paths to conversion long way
Conversions mean different things to different web sites. visitors to do. Are there other people in your organization who have a stake in th...

3. Ecommerce web sites and critical lead indicators
Analytics to watch The mantra of a great web site team should be 'measure everything'. Understanding what and why you're tracking certain th...

4. Lead generation metrics that are critical for your website
Critical lead generation metrics Lead generation sites are focused on getting people to contact them. Examples include real estate agents, a...

5. Using analytics metrics to make SEO decisions
Secondary metrics worth tracking Analytics data is great at showing trends in your site's visitors. These trends may be useful for making ce...

6. Roles in the construction of digital advertising
Other roles that have emerged with the growth of digital advertising are more related to the production of online campaigns. Digital technical...

7. Website traffic analyzer
Web traffic is traffic which is getting generated on the internet or on your Local Area Network (or LAN) Several web hosting companies provide a...

8. Monitoring via the Cloud: Monitis versus Open Source Monitoring Software
If your company is using a complex IT infrastructure and you are responsible for delivering mission critical applications, and every time the system is ...

9. The First monitoring company to fully integrate systems
There is a problem out there in IT-Land that is seldom spoken of, but just might be an IT manager's greatest nemesis: wasted time.  ...