Why is the Web like an iceberg?
By Joel Deane, ZDNN - July 7, 1999 2:55 PM PT
Because it's largely submerged -- Only 16 percent of sites are
indexed by search engines.
A new study has debunked one mainstream myth about cyberporn -- and confirmed
many a surfer's sneaking suspicions about search engines.
First, the porn myth.
According to a study conducted by Dr. Steve Lawrence and Dr. C. Lee Giles for
the NEC Research Institute, the Web contains about 800 million pages encompassing about 15
terabytes of data and about 180 million images. Contrary to popular opinion that the Web's
a haven for porn, though, the study found that only 1.5 percent of Web sites contain
pornographic content.
"The sex sites were much less than you would have thought," Lawrence
said.
In fact, the study, which will be published in the July 8 issue of Nature
magazine, found that commercial sites have taken over the Web, with 83 percent of sites
contain commercial content and 6 percent contain scientific/educational content.
Lawrence said the study gauged the Web's content by random sample -- the study
manually surveyed and categorized the content of 2,500 sites whose IP addresses had been
randomly selected.
The trouble with search engines
The study's other key finding won't be news to regular search engine or portal
users. According to the study, search engine coverage of the Web has decreased
substantially since December 1997, with no search engine indexing more than 16 percent of
the Web's indexable sites.
That means, for surfers navigating the Web via search engines, the Web's 15
terabytes of data is more than ever like an iceberg -- largely underwater. And, for
e-commerce sites, not being indexed by the search engines could be the difference between
sinking and swimming.
"That could have a substantial impact on their economic viability,"
Lawrence said.
"Because the situation now is relatively unequal, in the sense that ... the
more well known sites are the ones getting indexed.
Lawrence says the reason for that decreasing coverage of the Web is simple --
the search engines just can't keep up with the explosive growth in indexable pages -- but,
he assures, "that trend is going to reverse."
And why will it reverse? "At the moment you have a lot of information out
there that's not available on the Web," Lawrence said. But, once all that information
is available on the Web, the avalanche of indexable information getting posted on the Web
will slow, allowing the search engines to catch up.
And how long will it take for that information avalanche to ease? Lawrence
hasn't done precise calculations, but hazards an educated guess: "10, 20 years."
"Engines will be able to improve their coverage over time, but the question
is, will they really want to?"
Other findings in the study:
- Search engines are more likely to index sites that have more links to
them (more 'popular' sites).
- They are more likely to index U.S. sites than non-US sites.
- Search sites are more likely to index commercial sites than
educational sites.
- Indexing of new or modified pages by just one of the major search
engines can take months.