| The term "search engine"
is often used generically to describe both crawler-based search
engines and human-powered directories. These two types of
search engines gather their listings in radically different
ways.
Crawler-Based Search
Engines:
Crawler-based search engines, such as Google, create their
listings automatically. They "crawl" or "spider"
the web, then people search through what they have found.
If you change your web pages, crawler-based
search engines eventually find these changes, and that can
affect how you are listed. Page titles, body copy and other
elements all play a role.
Human-Powered Directories:
A human-powered directory, such as the Open Directory, depends
on humans for its listings. You submit a short description
to the directory for your entire site, or editors write one
for sites they review. A search looks for matches only in
the descriptions submitted.
Changing your web pages has no effect on
your listing. Things that are useful for improving a listing
with a search engine have nothing to do with improving a listing
in a directory. The only exception is that a good site, with
good content, might be more likely to get reviewed for free
than a poor site.
Hybrid Search Engines Or Mixed Results:
In the web's early days, it used to be that a search engine
either presented crawler-based results or human-powered listings.
Today it is extremely common for both types of results to
be presented. Usually, a hybrid search engine will favor one
type of listings over another. For example, MSN Search is
more likely to present human-powered listings from LookSmart.
However, it does also present crawler-based results (as provided
by Inktomi), especially for more obscure queries.
The Parts of a Crawler-Based Search
Engine:
Crawler-based search engines have three major elements. First
is the spider, also called the crawler. The spider visits
a web page, reads it, and then follows links to other pages
within the site. This is what it means when someone refers
to a site being "Spidered" or "crawled."
The spider returns to the site on a regular basis, such as
every month or two, to look for changes.
Everything the spider finds goes into the
second part of the search engine, the index. The index, sometimes
called the catalog, is like a giant book containing a copy
of every web page that the spider finds. If a web page changes,
then this book is updated with new information.
Sometimes it can take a while for new pages
or changes that the spider finds to be added to the index.
Thus, a web page may have been "Spidered" but not
yet "indexed." Until it is indexed -- added to the
index -- it is not available to those searching with the search
engine.
Search engine software is the third part
of a search engine. This is the program that sifts through
the millions of pages recorded in the index to find matches
to a search and rank them in order of what it believes is
most relevant.
|