Search Engine
Features Chart
|
|
If the field is in red, the information may be incorrect due to recent changes at the engine. |
||||||||
| Engine
General Information |
Yahoo | InfoSeek | Excite | Lycos | WebCrawler | AltaVista | HotBot | Northern Light |
| <title> character length used for listing |
N/A | 74 | 71 | 61 | 59 | 79 | 92 | 77 |
| Keyword in title is important | N/A | Yes (in channels) |
No | Yes | Yes | Yes | No | Yes |
| character length used for description in listing | N/A | 158 | 390 | 141 | 393 | 153 | 246 | 186 |
| <meta> description supported | N/A | Yes | No | No | Yes | Yes | Yes | No |
| <meta> keyword supported | N/A | Yes | No | Yes | Yes | Yes | Yes | Yes |
| <meta> keyword important | N/A | No | No | Yes | Yes | No | Yes | No |
| Indexes all visible text on page | N/A | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Searches are Case Sensitive | N/A | Yes | No | No | No | Yes | No | Yes |
| Word Stemming (hawaii=hawaiian) | No | Yes | No | Yes | No | No | No | No |
| Word Capitalization (Is toy=Toy?) | Yes | No | Yes | Yes | No | No | Yes | No |
| Keyword Order (Is "paris hotels"="hotels paris" |
Yes | No | Yes | No | Yes | Yes(?) | Yes | No |
| Plurals (Is toy=toys?) |
No | Yes | No | No | No | No | No | No |
| Looks for Keywords in URL's | No | Yes | Yes | No | No | No | No | No |
| Do newly submitted pages score better? |
No | - | - | Yes | - | No (older is better) |
Yes | - |
| Link Popularity | N/A | Yes (major influence) |
Yes | Yes | Yes | No | ? | No |
| Reviewed sites score higher? |
Yes | Yes | Yes | - | Yes | - | - | - |
| What
text is indexed? (and can be found during a default search) |
Yahoo | InfoSeek | Excite | Lycos | WebCrawler | AltaVista | HotBot | Northern Light |
| Words in <title> | N/A | 10 words 69 characters |
6 words 41 characters |
- | Yes | 12 words 96 characters | No | 12 words 96 characters |
| Meta Keywords <meta name= keywords... |
N/A | 105 words 762 characters |
No | No | - | 138 words 1024 characters |
No | No |
| Meta
Description <meta name= description... |
N/A | 34 words 246 characters |
No | No | - | 22 words, 154 characters |
- | No |
| Dublin Core Meta Data |
N/A | No | No | No | - | - | - | No |
| Image .alt text | N/A | No | No | Yes | No | Yes | Yes- | No |
| <!--
comment --> tag text |
N/A | No | No | No | No | No | - | No |
| Hidden Input form text (Trick #14) |
N/A | No | No | No | - | No | - | No |
| Form <option> text | N/A | Yes | N/A | N/A | N/A | N/A | N/A | N/A |
| Link
Descriptions <a href="www.your server.com"> keyword</a> |
N/A | Yes | Yes | Yes- | Yes | Yes | Yes | Yes |
| Text within
url link <a href="http://www .yourserver.com/ keyword.html> |
N/A | Yes but only relates to "link popularity" <HREF> Index | No | No | - | - | - | No |
| Indexes .asp files | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Spidering (how deep will they go and how long does it take?) | Yahoo | InfoSeek | Excite | Lycos | WebCrawler | AltaVista | HotBot | Northern Light |
| How many different pages can you submit? | 1 | 600 | 25? | No Limit | No Limit | 400 | No Limit | No Limit |
| Submitted Pages | 6 - 8 weeks | 1-7 days | 2-3 weeks | 2 weeks | 30-90 days | 48hrs | 48hrs | 2 weeks |
| 1 level deep | Not a spider | Monthly | 1 month | 1 month | No | 2-4 weeks | 2-4 weeks | 2 weeks |
| Multiple levels deep | - | Yes | Maybe | Maybe | No | Maybe | Yes | Yes |
| Supports Robots.txt | - | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Supports <meta> robot tag | - | Yes | No | No | Yes | No | No | Yes |
| Maximum pages indexed from one domain | 1 in most cases | Approx. 600 | - | - | - | 400 | - | - |
| Frames Support (noframes tag) | - | Yes | No | Yes | No | Yes | No | - |
| Follows Image Map Links | - | Yes | - | - | Yes | - | - | - |
| Penalties | Yahoo | InfoSeek | Excite | Lycos | WebCrawler | AltaVista | HotBot | Northern Light |
| # of pages you can submit per day without a penalty | Submit once only | 25 | 25 | No Limits | 25 | 1 per day recommended | 50 | No Limits |
| Submitting too fast | N/A | Yes -wait 30 minutes minimum |
No | No | No | No | No | No |
| Hidden Text | Most likely | Yes | None | None | None | None | None | None |
| Repetition of words in title | Yes | Yes | No | No | No | Yes | Maybe | No |
| Repetition of words in <meta> description | Must read well | Maybe | No | No | No | Maybe | No | No |
| Repetition of words in body text | Must read well | Yes | Yes | No | Yes | Yes | Maybe | No |
| Size The larger a search engine is, in terms of pages indexed, the more likely pages from your web site will be included. Actual numbers can be misleading, as explained below. So, search engines are categorized as big, medium or small. Expect to find most of your pages in a big search engine, some to many of your pages in a medium search engine and few or none of your pages in a small search engine. Why might a page not be included? See the section about depth, below. The figures shown are the last reported to me or reported elsewhere. Take them with a grain of salt. Some search engines may accidentally keep two copies of a web page but not take duplication into account when quoting numbers. There are also other factors that make comparisons difficult. Pages Crawled Per Day This shows how many pages a search engine can index per day. The more it can crawl, the more likely it can maintain a fresh index. However, this is not the only way to measure freshness. Search engines may learn how frequently pages change or use other methods to improve freshness to maximize a smaller crawling capacity. Freshness The web is constantly changing, so it's easy for search engine listings to become out-of-date. However, some listings may only be days old, while others may be months old -- or longer. There are various reasons why this occurs. Some search engines "instantly" index any page submitted to them, as explained below. It takes longer for them to return and gather non-submitted pages. Search engines also may crawl the "popular" parts of the web more frequently than other portions. Freshness shows the age of listings, from best to worse case scenarios for each search engine. Date Some search engines show the date when a web page was added. This provides a clue as to how fresh or stale the search engine's listings may be. Kudos to these search engines. The others leave you guessing about freshness. File date means that the date of the file is shown, rather than the date is was added to the index. For example, imagine you created a file on Jan. 1, 1998, and it was spidered on Feb. 1, 1998. A search engine showing file date would list the Aug. 1 date, not the Feb. 1 date. Submitted Pages Ideally, a search engine will find your pages as it follows links while crawling the web. Realistically, your pages will appear much faster if you submit them directly to the engine. This shows how soon to expect a page you submitted to appear in the search engine's listings. Non-Submitted Pages Once a page has been submitted, a search engine will usually find other pages from the site by following links from the submitted page. However, some engines take longer to gather these "non-submitted" pages. In particular, this is because some search engines "instantly" index a page that is submitted, then add the site to the schedule for future crawling. The chart shows how soon to expect other pages from your site to appear once you've submitted a single page -- and assuming there are no problems preventing the engine from finding these pages, such as frames or image maps, as explained below. Depth This is closely related to non-submitted pages. It indicates how many pages beyond the submitted page a search engine will gather. Search engines are operating in two manners: No Limit: These search engines will diligently try to gather everything they find at a web site. They may not get every page, but that remains the general goal. Sample: These search engines gather a sample of web pages from a web site. Some gather a bigger sample than others. Use the size listed as a guide to how large a sample you can expect each search engine to have gathered. Usually, the more popular a site is, the more likely it will be better represented in the search engine. Keep in mind that part of the web remains unindexed due to physical hurdles. Frames, image maps and dynamically generated pages can all cause information to be missed. Can the search engine follow frame links? If it can't, the search engine is probably missing much of your site. Image Maps Can the search engine follow client-side image maps? As with frames, if the search engine cannot follow image maps, it is probably missing much of your site. Password Protected Sites Some search engines can enter a password protected site, if you arrange for them to have a user name and password. Why do this? You may want people to discover you have content that matches their query. They'll still need to fill out the appropriate registration information at your site to access it, but at least they'll know it exists. All search engines can determine the popularity of a page by analyzing how many links there are to it from other pages. Some engines use this as a means to determine which pages they will include in the index. See the popularity page for tips on measuring your site's popularity according to different engines. Learns Frequency A number of search engines can learn how often your pages change. A site that changes often will be visited more often. Those that change infrequently get infrequent visits. Keep Out This indicates you to tell the search engines to keep out of your site. All of the major engines respect the robots.txt exclusion standard, which tells them not to index a site or parts of a site. Some also support the meta robots tag, where a crawler can be told "noindex" on a particular page. For more information about robots.txt, see the Robots Exclusion Standard page at: http://info.webcrawler.com/mak/projects/robots/exclusion.html Redirection Some sites redirect visitors from one web address to another. The chart shows which URL is associated with your listing, if you perform redirection. This is important, because if the search engine indexes the redirected page, you could have a problem with visitors locating it should it be moved or changed at a later date. Some search engines either leave out words when they index a page or may not search for these words during a query. These "stop words" are excluded as a way to save storage space or to speed searches. For the webmaster, it's important to consider stop words when crafting your pages. For example, AltaVista will ignore the word web in a search for web developer, so there's little sense in trying to improve your ranking under those keywords. Stop words are an excellent reason why search engine users should surround keywords with quotes or use other power tips to ensure that words are not ignored in their search. All the search engines use the location of keywords and frequency in a web page as the basis of ranking pages in response to a query. The exact mechanism is slightly different for each engine. In addition to location/frequency, some engines may give a page a relevancy boost based on link popularity or other factors. These help a little, but they don't guarantee a boost to the top. It's quite possible that the most linked to page on the web will still perform poorly if there's another page that's more relevant to the particular query. Spam Penalty All major search engines penalize sites that attempt to "spam" the engines in order to improve their position. One common technique is "stacking" or "stuffing" words on a page. This is where a word is repeated many times in a row. There are a number of other techniques. I don't approve of them, so you won't find them listed here. In general, they don't work well, and they often make a page look stupid and unprofessional. If the search engines spot a spamming technique, they may downgrade a page's ranking or exclude it from listings altogether. One easy way search engines discover pages are through "spam narking," when people complain about pages using spam. Many believe all search engines acknowledge keywords and descriptions placed in meta tags. In reality, only some do. Titles This shows how the search engines generate a title for your listing. Descriptions This shows how the search engines generate a description for your listing. Results At A Time How many results you can display at one time. Defaults are shown in bold. Sometimes you may need to use a special power search page to change the default, but in most cases, you do not. Display Options Shows the different ways you can display results, with the default listed first. Most search engines usually let you view only page titles or titles a description This shows whether you can determine if a web page has been indexed by the search engine. "Displays listing" means that you can easily search for a particular page and see exactly how it appears in the index. This is marked as "semi" for HotBot, as it is not so easy to specific a particular URL. "Reports if indexed" means that there is a URL status check form that will tell you if the page is in the index. However, you can't see the actual listing easily. Site Removal Sometimes web pages are removed or sites shifted to a new domain. Some search engines may continue to find the "old" pages unless certain measures are taken. These are noted on the chart for each search engine and include:
For more information about the robots.txt file, see: Robots Exclusion Protocol Each search engine uses a "crawler" or "spider" agent to gather web pages. Most have nicknames. These names are often part of the crawler's host name. You can tell if you've been visited by a crawler by checking your access logs and looking for the various names. In addition, spiders often report an agent name. Instead of saying Mozilla, as the Netscape browser does, a spider reports its own name. For example, Excite will say "Architext" spider. Indexes ALT Text / Comment Text Shows if the search engine indexes ALT text associated with images or text in comment tags. Stemming Shows whether the search engine will also search for variations of a word based on its stem. For example, entering "swim" might also find "swims" and "swimming." Case Sensitive Shows whether a search engine is case-sensitive. |
RETURN TO THE MARKETING
PAGE
NETWORK | HOME | SERVICES | CORPORATE FACTS | WHAT'S NEW | CONTACT US | FUTURE MALLS
© 1998, MyMall Network Corporation