Search Engine Features Chart
This is a sample of the information you receive when subscribing to PlanetOcean

Most search engine comparison charts are made for search engine users. The search engine features chart below is designed primarily for webmasters who care about how search engines index their sites. It provides a summary of important factors and features that can affect how a site is indexed.

Although designed for webmasters, search engine users will also find portions of the search engine comparison chart useful in determining how fresh and complete the different search engines are.

Full explanations can be found immediately below the chart. Please note that in a few places on the chart, a - symbol is used to denote unknown or unresearched answers.

These charts are as of June 30, 1998.


If the field is in red, the information may be incorrect due to recent changes at the engine.

Engine General
Information
Yahoo InfoSeek Excite Lycos WebCrawler AltaVista HotBot Northern Light
<title> character
length used for listing
N/A 74 71 61 59 79 92 77
Keyword in title is important N/A Yes
(in channels)
No Yes Yes Yes No Yes
character length used for description in listing N/A 158 390 141 393 153 246 186
<meta> description supported N/A Yes No No Yes Yes Yes No
<meta> keyword supported N/A Yes No Yes Yes Yes Yes Yes
<meta> keyword important N/A No No Yes Yes No Yes No
Indexes all visible text on page N/A Yes Yes Yes Yes Yes Yes Yes
Searches are Case Sensitive N/A Yes No No No Yes No Yes
Word Stemming (hawaii=hawaiian) No Yes No Yes No No No No
Word Capitalization (Is toy=Toy?) Yes No Yes Yes No No Yes No
Keyword Order
(Is "paris hotels"="hotels paris"
Yes No Yes No Yes Yes(?) Yes No
Plurals
(Is toy=toys?)
No Yes No No No No No No
Looks for Keywords in URL's No Yes Yes No No No No No
Do newly submitted
pages score better?
No - - Yes - No
(older is better)
Yes -
Link Popularity N/A Yes
(major influence)
Yes Yes Yes No ? No
Reviewed sites
score higher?
Yes Yes Yes - Yes - - -

 

What text is
indexed?
(and can be found
during a default
search)
Yahoo InfoSeek Excite Lycos WebCrawler AltaVista HotBot Northern Light
Words in <title> N/A 10 words
69 characters
6 words
41
characters
- Yes 12 words 96 characters No 12 words
96 characters
Meta Keywords
<meta name=
keywords...
N/A 105 words
762 characters
No No - 138 words
1024
characters
No No
Meta Description
<meta name=
description...
N/A 34 words
246 characters
No No - 22 words,
154 characters
- No
Dublin Core
Meta Data
N/A No No No - - - No
Image .alt text N/A No No Yes No Yes Yes- No
<!-- comment -->
tag text
N/A No No No No No - No
Hidden Input
form text
(Trick #14)
N/A No No No - No - No
Form <option> text N/A Yes N/A N/A N/A N/A N/A N/A
Link Descriptions
<a  href="www.your
server.com">
keyword</a>
N/A Yes Yes Yes- Yes Yes Yes Yes
Text within url link
<a href="http://www
.yourserver.com/
keyword.html>
N/A Yes but only relates to "link popularity" <HREF> Index No No - - - No
Indexes .asp files Yes Yes Yes Yes Yes Yes Yes Yes

 

Spidering (how deep will they go and how long does it take?) Yahoo InfoSeek Excite Lycos WebCrawler AltaVista HotBot Northern Light
How many different pages can you submit? 1 600 25? No Limit No Limit 400 No Limit No Limit
Submitted Pages 6 - 8 weeks 1-7 days 2-3 weeks 2 weeks 30-90 days 48hrs 48hrs 2 weeks
1 level deep Not a spider Monthly 1 month 1 month No 2-4 weeks 2-4 weeks 2 weeks
Multiple levels deep - Yes Maybe Maybe No Maybe Yes Yes
Supports Robots.txt - Yes Yes Yes Yes Yes Yes Yes
Supports <meta> robot tag - Yes No No Yes No No Yes
Maximum pages indexed from one domain 1 in most cases Approx. 600 - - - 400 - -
Frames Support (noframes tag) - Yes No Yes No Yes No -
Follows Image Map Links - Yes - - Yes - - -

 

Penalties Yahoo InfoSeek Excite Lycos WebCrawler AltaVista HotBot Northern Light
# of pages you can submit per day without a penalty Submit once only 25 25 No Limits 25 1 per day recommended 50 No Limits
Submitting too fast N/A Yes -wait 30 minutes
minimum
No No No No No No
Hidden Text Most likely Yes None None None None None None
Repetition of words in title Yes Yes No No No Yes Maybe No
Repetition of words in <meta> description Must read well Maybe No No No Maybe No No
Repetition of words in body text Must read well Yes Yes No Yes Yes Maybe No

 

Size

The larger a search engine is, in terms of pages indexed, the more likely pages from your web site will be included. Actual numbers can be misleading, as explained below. So, search engines are categorized as big, medium or small.

Expect to find most of your pages in a big search engine, some to many of your pages in a medium search engine and few or none of your pages in a small search engine. Why might a page not be included? See the section about depth, below.

The figures shown are the last reported to me or reported elsewhere. Take them with a grain of salt. Some search engines may accidentally keep two copies of a web page but not take duplication into account when quoting numbers. There are also other factors that make comparisons difficult.

Pages Crawled Per Day

This shows how many pages a search engine can index per day. The more it can crawl, the more likely it can maintain a fresh index. However, this is not the only way to measure freshness. Search engines may learn how frequently pages change or use other methods to improve freshness to maximize a smaller crawling capacity.

Freshness

The web is constantly changing, so it's easy for search engine listings to become out-of-date. However, some listings may only be days old, while others may be months old -- or longer.

There are various reasons why this occurs. Some search engines "instantly" index any page submitted to them, as explained below. It takes longer for them to return and gather non-submitted pages. Search engines also may crawl the "popular" parts of the web more frequently than other portions.

Freshness shows the age of listings, from best to worse case scenarios for each search engine.

Date

Some search engines show the date when a web page was added. This provides a clue as to how fresh or stale the search engine's listings may be. Kudos to these search engines. The others leave you guessing about freshness.

File date means that the date of the file is shown, rather than the date is was added to the index. For example, imagine you created a file on Jan. 1, 1998, and it was spidered on Feb. 1, 1998. A search engine showing file date would list the Aug. 1 date, not the Feb. 1 date.

Submitted Pages

Ideally, a search engine will find your pages as it follows links while crawling the web. Realistically, your pages will appear much faster if you submit them directly to the engine. This shows how soon to expect a page you submitted to appear in the search engine's listings.

Non-Submitted Pages

Once a page has been submitted, a search engine will usually find other pages from the site by following links from the submitted page. However, some engines take longer to gather these "non-submitted" pages. In particular, this is because some search engines "instantly" index a page that is submitted, then add the site to the schedule for future crawling.

The chart shows how soon to expect other pages from your site to appear once you've submitted a single page -- and assuming there are no problems preventing the engine from finding these pages, such as frames or image maps, as explained below.

Depth

This is closely related to non-submitted pages. It indicates how many pages beyond the submitted page a search engine will gather. Search engines are operating in two manners:

No Limit: These search engines will diligently try to gather everything they find at a web site. They may not get every page, but that remains the general goal.

Sample: These search engines gather a sample of web pages from a web site. Some gather a bigger sample than others. Use the size listed as a guide to how large a sample you can expect each search engine to have gathered. Usually, the more popular a site is, the more likely it will be better represented in the search engine.

Keep in mind that part of the web remains unindexed due to physical hurdles. Frames, image maps and dynamically generated pages can all cause information to be missed.

Frames Support

Can the search engine follow frame links? If it can't, the search engine is probably missing much of your site.

Image Maps

Can the search engine follow client-side image maps? As with frames, if the search engine cannot follow image maps, it is probably missing much of your site.

Password Protected Sites

Some search engines can enter a password protected site, if you arrange for them to have a user name and password. Why do this? You may want people to discover you have content that matches their query. They'll still need to fill out the appropriate registration information at your site to access it, but at least they'll know it exists.

Link Popularity

All search engines can determine the popularity of a page by analyzing how many links there are to it from other pages. Some engines use this as a means to determine which pages they will include in the index. See the popularity page for tips on measuring your site's popularity according to different engines.

Learns Frequency

A number of search engines can learn how often your pages change. A site that changes often will be visited more often. Those that change infrequently get infrequent visits.

Keep Out

This indicates you to tell the search engines to keep out of your site. All of the major engines respect the robots.txt exclusion standard, which tells them not to index a site or parts of a site. Some also support the meta robots tag, where a crawler can be told "noindex" on a particular page. For more information about robots.txt, see the Robots Exclusion Standard page at:

http://info.webcrawler.com/mak/projects/robots/exclusion.html

Redirection

Some sites redirect visitors from one web address to another.

The chart shows which URL is associated with your listing, if you perform redirection. This is important, because if the search engine indexes the redirected page, you could have a problem with visitors locating it should it be moved or changed at a later date.

Stop Words

Some search engines either leave out words when they index a page or may not search for these words during a query. These "stop words" are excluded as a way to save storage space or to speed searches.

For the webmaster, it's important to consider stop words when crafting your pages. For example, AltaVista will ignore the word web in a search for web developer, so there's little sense in trying to improve your ranking under those keywords.

Stop words are an excellent reason why search engine users should surround keywords with quotes or use other power tips to ensure that words are not ignored in their search.

Relevancy Boosters

All the search engines use the location of keywords and frequency in a web page as the basis of ranking pages in response to a query. The exact mechanism is slightly different for each engine.

In addition to location/frequency, some engines may give a page a relevancy boost based on link popularity or other factors. These help a little, but they don't guarantee a boost to the top. It's quite possible that the most linked to page on the web will still perform poorly if there's another page that's more relevant to the particular query.

Spam Penalty

All major search engines penalize sites that attempt to "spam" the engines in order to improve their position. One common technique is "stacking" or "stuffing" words on a page. This is where a word is repeated many times in a row. There are a number of other techniques. I don't approve of them, so you won't find them listed here. In general, they don't work well, and they often make a page look stupid and unprofessional.

If the search engines spot a spamming technique, they may downgrade a page's ranking or exclude it from listings altogether. One easy way search engines discover pages are through "spam narking," when people complain about pages using spam.

Meta Tag Support

Many believe all search engines acknowledge keywords and descriptions placed in meta tags. In reality, only some do.

Titles

This shows how the search engines generate a title for your listing.

Descriptions

This shows how the search engines generate a description for your listing.

Results At A Time

How many results you can display at one time. Defaults are shown in bold. Sometimes you may need to use a special power search page to change the default, but in most cases, you do not.

Display Options

Shows the different ways you can display results, with the default listed first. Most search engines usually let you view only page titles or titles a description

URL Status Check

This shows whether you can determine if a web page has been indexed by the search engine. "Displays listing" means that you can easily search for a particular page and see exactly how it appears in the index. This is marked as "semi" for HotBot, as it is not so easy to specific a particular URL. "Reports if indexed" means that there is a URL status check form that will tell you if the page is in the index. However, you can't see the actual listing easily.

Site Removal

Sometimes web pages are removed or sites shifted to a new domain. Some search engines may continue to find the "old" pages unless certain measures are taken. These are noted on the chart for each search engine and include:

  • Removal: Removing the old pages from the server. The engine will revisit, try to reindex the pages using the addresses in its database. When it discovers they no longer exist, they will be removed from the database.
  • Robots.txt: Create a robots.txt file listing the site or pages. The engine will note the new restriction and remove the pages from the index.

For more information about the robots.txt file, see:

Robots Exclusion Protocol
http://info.webcrawler.com/mak/projects/robots/robots.html

Crawler Name

Each search engine uses a "crawler" or "spider" agent to gather web pages. Most have nicknames. These names are often part of the crawler's host name. You can tell if you've been visited by a crawler by checking your access logs and looking for the various names. In addition, spiders often report an agent name. Instead of saying Mozilla, as the Netscape browser does, a spider reports its own name. For example, Excite will say "Architext" spider.

Indexes ALT Text / Comment Text

Shows if the search engine indexes ALT text associated with images or text in comment tags.

Stemming

Shows whether the search engine will also search for variations of a word based on its stem. For example, entering "swim" might also find "swims" and "swimming."

Case Sensitive

Shows whether a search engine is case-sensitive.


RETURN TO THE MARKETING PAGE
NETWORK | HOME | SERVICES | CORPORATE FACTS | WHAT'S NEW | CONTACT US | FUTURE MALLS


© 1998, MyMall Network Corporation