Home







 


About Search Engine

How Search Engines Work

 

Search Engines for the general web (like all those listed below) do not really search the World Wide Web directly. Each one searches a database of the full text of web pages selected from the billions of web pages out there residing on servers. When you search the web using a search engine, you are always searching a somewhat stale copy of the real web page. When you click on links provided in a search engine's search results, you retrieve from the server the current version of the page.

Some of the major search engines are GoogleAltavista, MSN, Excite, Hotbot, Infoseek, Lycos, and Webcrawler. Note that Yahoo is a directory, not a search engine. The term Search Engine is also often used to describe both directories and search engines.

Search engine databases are selected and built by computer robot programs called spiders. Spider is that part of a search engine which surfs the web, storing the URLs and indexing the keywords and text of each page it finds. For example, Google's spider, also referred to as a "crawler", is named Googlebot. Although it is said they "crawl" the web in their hunt for pages to include, in truth they stay in one place. They find the pages for potential inclusion by following the links in the pages they already have in their database (i.e. already "know about"). They cannot think or type a URL or use judgment to "decide" to go look something up on the Internet.

If a web page is never linked to in any other page, search engine spiders cannot find it. The only way a brand new page - one that no other page has ever linked to - can get into a search engine is for its URL to be sent by some human to the search engine companies as a request that the new page be included. All search engine companies offer ways to do this.

After spiders find pages, they pass them on to another computer program for "indexing." This program identifies the text, links, and other content of the page and stores it in the search engine database's files so that the database can be searched by keyword and whatever more advanced approaches are offered, and the page will be found if your search matches its content.

Some types of pages and links are excluded from most search engines by policy. Others are excluded because search engine spiders cannot access them. Pages that are excluded are referred to as the "Invisible Web", i.e. what you don't see in search engine results. The Invisible Web is estimated to be two to three or more times bigger than the visible web. (Source: University of Berkeley)