What are search engine crawlers and what do they do


What is a search engine crawler?

A crawler - also known as a spider, robot, or worm (not to be confused with a worm virus) - is an automated tool that visits a web page, finds the information on the page, and then follows links to other pages within that site. The job of the crawler is to find the information and hand it off to the search engine's indexers. Web crawlers do not actually search the Web at all.

They work much the way your browser does, sending a request to a web server for a web page, downloading everything on that page and giving it to the indexer.

Crawlers find information in two ways. Early on, you could send the search engine your information and it would be added to the database. The crawler would take the information you sent and go retrieve the web pages. Unfortunately, people overwhelmed the "add URL" pages on the search engines with bogus posts and the search engine companies started to phase out that voluntary way of notifying them.

What do crawlers do?

Now, what the crawlers do is look at the URL links on the web pages it finds and goes back over all of those links. This cuts down on the bogus URLs and helps the crawler be thorough. The search engine crawls and indexes information, but search engines do further refinement before the information is available to the public. The companies perform spam detection and removal, duplication detection and removal, and also do some database quality testing. So, the information found on a website and indexed is not available in your search for several weeks.

All this comes at a cost. Crawling is extremely expensive for the search engine companies, so most search companies limit the number of pages that will be crawled on one website. That means that search crawlers may look at an entire website, but may only crawl a part of it, leaving a lot of valuable information not indexed.

These are sites that can be located, but are intentionally not included in the search engine indices. They are what Gary Price and Chris Sherman like to call the "opaque web." These are not part of the "invisible web." They are simply pages that cannot be indexed.

Besides cost, the other major issue with crawlers is the time it takes to log all this information. While some crawlers can index millions of pages in a day, there is sometimes a significant amount of time between when the information is put on the Web, when it is found by the crawler, and when the crawler returns to recrawl, looking for new material. These time lag issues lead to inaccuracy in your results.

There is an ongoing debate about the "freshness" of the search engines. Most search engine companies claim they constantly crawl and have only the freshest of information, but analysis by Gary Price and Greg Notess found that the search engine companies tend to be weeks behind on a regular basis, and many search tools are months behind in their efforts to recrawl and index material. How far behind and how much information they cover is a matter of some debate.

Legal Disclaimer

Our website is not responsible for the information contained by this article. Webworldarticles.com is a free articles resource thus practically any visitor can submit an article. However if you notice any copyrighted material, please contact us and we will remove the article(s) in discussion right away.


This article was sent to us by: Damian Lissle at 08272010

Related Articles

1. Natural language search engines and when to use them
What are natural language search engines? The so-called "natural language search engine" allows users to submit search terms in English rather than using Boolea...

2. Comparison between search engines and subject directories
Search engines Vs. Subject directories Should you use a subject directory or a search engine? Often the answer is both. If you are just beginning ...

3. What are meta search tools and what purpose do they have
Meta search tools You can access multiple search engines and subject directories simultaneously through metasearch tools. Good Meta-search tool examples include...

4. Search engine positioning and the so called sponsored links
Meta search tools and paid search engine positioning So, how much have meta-search tools endorsed paid positioning? More than fifty percent of the results returne...

5. Discover the features of AltaVista and what it does best
AltaVista Around since 1995, this easy-to use search engine was one of the largest and best. While the database is the same as Yahoo!'s, ...

6. What special features make Google unique among search engines
Google special features Most of Google's special features are found on its Advanced Search page, found by clicking on the tab on the right side of the search ba...

7. Gigablast and Lycos are promising search engines
Gigablast This is a site everyone should look at. Gigablast debuted in 2002 and it offers some really nice ...

8. Take a glimpse at these great meta search tools
Kartoo A unique search tool - if you like the idea of seeing your web results visually, this meta-search site shows the results with sites being interconnected ...