How a search engine like Google assembles its index


How a search engine builds its database

Searching Google or another search engine really means searching the index of that site's in-house database of web pages and not the Web itself. These databases hold literally vast amounts of individual web pages. That's not necessarily the entire Web, but it's a good portion from it. How does a search site pick which web pages to index and store on its servers? It's an intricate process with several components. First and foremost, most of the pages in the site's database are simply by special spider or crawler software.

This really is software that automatically crawls the Web, searching for new and updated web pages. Most spiders not only search for new web pages (by exploring links with other pages on the pages it already knows about), but also periodically recrawl pages already in the database, checking for changes and updates.

An entire recrawling of the web pages in a search site's database typically takes place every few weeks, so no individual page is more than the usual few weeks out of date. The search engine's spider reads each page it encounters, much like a web browser does. The result is every link on every page until all the links happen to be followed. This is the way new pages are added to the site's database, by using those links the spider hasn't seen before.

The pages discovered by the spider are copied verbatim to the search site's database and copied over each time they're updated. These stored web pages are used to compile the page summaries that appear on search results pages. To search its database, the search site creates a catalog to all the stored web pages. This search engine index contains a list of all of the important words utilized on every stored web page in the database. Once the index has been compiled, it's simple to search for a particular word and have returned a summary of all the web pages on which that word appears.

And that's exactly how a search index and database work to serve search queries. A person enters one or more words in a question, the search engine searches its index for those words, and then those web pages that contain those test is returned as search results. This really is fairly simple in concept but much more complex in execution especially given that each search engine indexes all the words on several billion web pages.

How search results are ranked

Like a web marketer, you care less about how Google or Yahoo! searches the Web than you need to do about how high up you appear in that search engine's search engine pages. What makes a search engine rank a specific site high in its search results along with a similar site much lower?

Each search engine has its own particular algorithm for ranking the web pages in its search index. In general, though, to follow similar methodology; similar factors are important to all the major search engines. To that end, it's instructional to look at how Google, the Web's largest and many popular search engine, ranks its results.

Google, like several the other search engines, tries to serve its users by ranking the most crucial or relevant pages listed first and ranking less-relevant pages lower in the results. How does Google determine which web pages work best match to some given query?

While Google keeps its precise methodology under lock and key, for competitive reasons, we do know that there are three primary components to its results rankings:

Text analysis - Google looks not only for matching words on a web page, but in addition for how those words are used. That means examining font size, usage, proximity, and more than a hundred other factors to help determine relevance. Google also analyzes the content of neighboring pages on the same website to ensure that the selected page is the best match.

Links and link text - Google then compares the links (and the text for those links) on the web page, making sure that they link to pages that are highly relevant to the searcher's query.

PageRank - Finally, Google depends on its own proprietary PageRank technology to provide an objective measurement of web page importance and popularity. PageRank determines a page's importance by counting the amount of other pages that link to that page. The more pages that link to a page, the larger that page's PageRank and also the higher it'll appear in the search results.

Legal Disclaimer

Our website is not responsible for the information contained by this article. Webworldarticles.com is a free articles resource thus practically any visitor can submit an article. However if you notice any copyrighted material, please contact us and we will remove the article(s) in discussion right away.


This article was sent to us by: Ethan Floyd at 03142011

Related Articles

1. Search engine algorithms and how websites rank
When you give a search engine a keyword, how does it know which of the millions of pages in its index to sort through? It follows a set of rules known as search algorithm...

2. Natural language search engines and when to use them
What are natural language search engines? The so-called "natural language search engine" allows users to submit search terms in English rather than using Boolea...

3. Comparison between search engines and subject directories
Search engines Vs. Subject directories Should you use a subject directory or a search engine? Often the answer is both. If you are just beginning ...

4. What are meta search tools and what purpose do they have
Meta search tools You can access multiple search engines and subject directories simultaneously through metasearch tools. Good Meta-search tool examples include...

5. Search engine positioning and the so called sponsored links
Meta search tools and paid search engine positioning So, how much have meta-search tools endorsed paid positioning? More than fifty percent of the results returne...

6. Discover the features of AltaVista and what it does best
AltaVista Around since 1995, this easy-to use search engine was one of the largest and best. While the database is the same as Yahoo!'s, ...

7. What special features make Google unique among search engines
Google special features Most of Google's special features are found on its Advanced Search page, found by clicking on the tab on the right side of the search ba...