Unraveling the Web
Deborah Solomon, Chronicle Staff Writer
Monday, August 30, 99
Finding information on the Web has always been frustrating, and as the amount of
data on the Web explodes, the search is only going to get harder. But a new breed of
search engines is aiming to ease the aggravation. Some are searching ever-larger portions
of the Web. Others are employing staffs of editors to hunt down the best sites for
particular queries. A few are even running ``popularity'' contests, letting Web surfers
and Web-page designers guide one another to the best sites.
Always eager for the next best thing, Internet users are heading to the new
sites in droves, giving veteran search engines a run for their money. LookSmart, Ask
Jeeves and GoTo -- all fairly new sites
--ranked among the top 10 search sites for July 1999, according to Media Metrix.
To use the new technologies, it helps to understand the old ones.
Traditional search engines such as Excite and AltaVista use software programs
that search for Web sites containing whatever keywords the user has entered into the
search bar.
These programs -- known as ``spiders'' or ``crawlers'' -- visit a Web page, read
it and record the words on each page. The spider then makes a list of which words appear
on which pages and returns those pages whenever a user types in that keyword. Generally,
the more times a keyword appears on a page, the higher it ranks on a list of results.
But search engines don't always produce the best results.
For one thing, many Web site designers ``wallpaper'' their pages, loading them
up with keywords so they'll jump to the top of a search- results list.
Also, unless you're a skilled searcher, you'll get thousands of irrelevant
results. For example, if you type the word ``weddings'' into a search engine, you're
likely to get photos of people's weddings, wedding photographers in Iowa and men looking
for wives.
Another type of search site -- called a directory -- separates search results
into user-friendly categories.
For example, type ``weddings'' into Yahoo, a direc-
tory, and you'll get choices such as wedding rings, gowns and even movies
(including ``The Wedding Singer'').
Some of the new sites are search engines and some are directories, but all claim
to go a step beyond the existing technology.
--AlltheWeb, which launched in May, aims to access more of the Web than any
other search engine. It was the first to break the 200 million Web-page barrier and claims
that it will access ``all the Web'' -- 800 million pages -- in six months.
--Google of Palo Alto is a search engine that ranks Web sites based on how
popular they are with other Web authors. For example, the more sites that include links to
a particular page like Joe's Home Page, the more likely it is that Joe's Home Page will
pop to the top of Google's search results.
Google's founders say it's a democratic process that lets the Web community
determine which pages are worthy.
It was created by two Stanford University students, Sergey Vrin and Larry Page,
who were frustrated by the existing choice of search engines.
``There are a number of other companies like Excite and Infoseek and they have
search components, but primarily, they are media companies,'' said Vrin. Google is a no-
frills search engine that aims to do one thing well.
--Direct Hit also uses a ``popularity engine'' to deliver Web sites. When users
key in a search, Direct Hit anonymously monitors which sites they access and how much time
they spend there. The more often a site is accessed and the longer it is used, the higher
its ranking.
--Ask Jeeves of Berkeley lets users pose questions in plain English. Ask Jeeves
then directs users to sites that provide the best answers. (For a complete explanation, go
to http://www.askjeeves.com/, click on ``Popular
Questions'' and then ``What Is Ask Jeeves.'')
Ask Jeeves does a good job answering simple, common questions, such as ``What is
the capital of New Hampshire?'' or ``How high is the Empire State Building?'' But often
it's stumped by more difficult queries.
--LookSmart of San Francisco is a directory that has more than 200 editors who
scour the Web, search out the best sites and add them to a growing directory of more than
800,000 unique pages in 60,000 categories.
``The way we differentiate ourselves is in the actual size and quality of our
site,'' said Val Landi, senior vice president of marketing and media. ``We believe we have
the largest staff of editors, and the assumption is that, as opposed to our robotic
brethren out there, if you have intelligent, professional editors, they can select the
best quality Web sites.''
While there are no scientific studies that prove which type of search engine
produces better results, industry watchers say the Googles and LookSmarts of the world do
a good job.
The competition from these upstarts has not gone unnoticed. In recent weeks,
some of the original search engines have announced planned improvements.
Excite announced plans to access more Web pages -- up to 43 percent of the Web,
compared with about 6 percent now -- and is pairing with LookSmart to provide a directory.
Netscape announced that it would use Google's technology to power its search
function.
AltaVista's new owner, CMGI Inc., vowed to improve the search function and turn
the site into a ``megaportal'' where users can also do things like buy a car and trade
stocks.
Microsoft also plans to announce a major revamping of its MSN.com search site in
early fall.
``We are developing a next-generation search engine that leverages a lot of our
experience in making software easier to use to deliver a search experience that is
everything you never thought search could be,'' said Rob Bennett, director of marketing
for MSN.
Industry watchers say members of the old guard need to improve their search
functions if they want to keep users coming back.
``There is no allegiance out there,'' said John Corcoran, an analyst with
Stephens Inc. in Boston. ``Consumers are saying, `If I can get the information with only
three clicks on Yahoo, fantastic. But if I can get it with two clicks on a newer search
engine, I'll use that one.' ''
Getting repeat customers is crucial if Net companies want to attract advertising
and e-commerce partners.
``Search is a business. It clearly generates revenue because it's one of the
best opportunities to reach people who are looking for something in particular,'' said
Barry Parr, who tracks search engines for International Data Corp.
While some of the older search sites have transformed themselves into portals,
some of the newer ones, including AlltheWeb and Google, plan to make money by licensing
their technology to other search engines and to companies that want a search tool on their
Web sites.
``Our focus is narrow: We want to do core search technology better than anyone
else in the world, then take that technology and sell it to big companies,'' said David
Burns, president and CEO of Fast Search & Transfer, which operates AlltheWeb.com.
With all the choices out there, Web searchers may be more confused than ever.
Here's a piece of basic advice:
``If you don't know where to begin, directories are good places to start,'' said
Danny Sullivan, editor of SearchEngineWatch.com in London.
When the hunt is for something very specific or obscure, like information on a
rare medical condition, then a search engine may be the better choice.
``A search engine is good because you're getting into the nooks and crannies of
the Web,'' Sullivan said.
A recent study published in the journal Nature prompted debate when it revealed
that most search sites access just a tiny fraction of the Web. Northern Light accessed the
most: 16 percent of the Web, compared with just 5.6 percent for Excite. AlltheWeb searches
25 percent but was not included in the study.
But unless you're looking for something rare, chances are you don't need to
access much more than 15 percent of the Web.
``Most people don't say, `I wish I had another 1 million pages to look through,'
'' Sullivan said.
Parr of IDC agreed. ``If you're doing a general search and you get 1 million
pages back instead of 150,000, you're only going to look at the first 20 anyway.''
Eventually, the number of search sites will stop growing and begin to shrink.
``The market will have a shakeout,'' Corcoran said. ``Why do we need 45 search
engines? We don't. The big ones now will continue to get bigger and will continue to spend
large amounts of money growing their subscriber base. And the end result will be a couple
of really big, really good sites.''
TOP WEB SEARCH SITES
There are different ways to search the Web. Search engines, like AltaVista,
crawl the web and record the text on every Web page. When you make a query, the search
engine goes into the depths of the page to find relevant keywords.
A directory, such as Yahoo, is an organized selection of categories, such as
Travel and Food. The content within those categories has been hand-picked by humans, who
scour the Web, looking for the best sites. When you submit a query, it pulls up relevant
sites just from the ones that are included in the directory.
Name / Address / Type / How much of the Web it searches / Comments
AltaVista / http://www.altavista.com/ /
search engine - 15.5% / One of the oldest search engines, recently invested in by CMGI.
Excite / http://www.excite.com/ / portal /
5.6% / Plans to add more pages to its search and access about 50% of the Web.
AllTheWeb / http://www.alltheweb.com/ /
search engine / 25% / Plans to increase to 100% over next year.
Northern Light / http://www.northernlight.com/
/ search engine - 16% / Named for a clipper ship built in Boston in 1853 and known for its
technology.
Google / http://www.google.com/ / search
engine / 7.8% / Ranks Web sites bases on how often they're linked to from other sites.
LookSmart / http://www.looksmart.com/ /
directory / N/A / Has more than 60,000 categories of information.
Yahoo / http://www.yahoo.com/ / directory /
7.4% / Most visited search site -- more than 38.9 million visitors.
Lycos / http://www.lycos.com/ / directory /
2.5% / Recently switched from a search engine to a directory model.
Go Network / http://www.go.com/ / portal / 8% /
The Mickey Mouse Corp. soon will own all of go.com.
Ask Jeeves / http://www.askjeeves.com/ /
search engine / 7 million answers / lets users make queries in form of a question.
SIMPLE SEARCH STRATEGIES
Most, but not all, search engines incorporate Boolean searching as an advanced
form of linking keywords. The results will narrow the number of pages pertaining to your
keywords. Sometimes, if you look carefully on the search page, you can find an icon or
phrase that will say "advanced searches." This will lead you to a page that will
accept full Boolean terms. Also, there usually are instructions included on that Web page.
--When making a search, analyze your idea and try to state it clearly in a
simple sentence.
SEARCH TERMS / WHAT YOU ENTER IN SEARCH BOX
I want to buy a dog, but I'm not sure of what breed. AND or (+) / dogs AND buy
AND puppies AND breed Finds Web pages containing all keywords +dogs +buy +puppies +breeds
If I buy a dog, how will I groom and take care of it? OR / buy OR purchase AND
dogs OR puppies AND grooming OR cleaning AND care OR raising Finds Web pages containg any
and all keywords. Using synomyns will increase your chances of finding pages
"When taking care of your new dog" "Phrase search" /
"when taking care of your new dog" Use quotation marks to look for a specific
phrase
I want to buy a dog, but I'm not sure what breed.
AND NOT or (-) / dogs AND buy AND breed AND NOT "School of" AND NOT
"School" +dogs +breeds +grooming -"School of" -"School"
Retrieves Web pages containing one keyword but not the other. Helps in reducing the number
of pages youy don't want. Make sure to use quotation marks if keywords could be specific
phrase.
What kind of dog breed Asterik (*) / dog and breed* (will result in breed,
breeds, breeding, and breeders of dogs) By truncating a keyword and placing an asterik at
its end, search engines find variations for the word.
I want Beagles, Dalmations and Bulldogs Comma (,) / Beagles, Dalmations,
Bulldogs A comma is used to separate and search for proper nouns
I want to find a Web page "Raising and caring for your dog" Title
search (title:) / title: Raising and caring for your dog Searches the titles of Web pages.
--Remember to spell keywords properly. Sometimes the Web will have different
ways of spelling the word.
--Search engines recognize keywords in all lowercase letters as either uppercase
or lowercase letters. If you use initial capital letters or all capital letters, the
search engine will return only pages that match your keyword exactly.
Sources: U.C. Berkeley