Looking beyond the surface – Exploring Deep Web

ImageAfter reading about it around 5 years, the concept of Deep Web continues to fascinate me. The Facebook, Wikipedia and all form just 4% of the World Wide Web. The rest 96%, tens of trillions of pages, not reachable by any search engine forms the Deep Web or the Invisible Web. The content may range from boring statistics to sale of human organs on the black market. In fact, in October 2013, FBI shut down Silk Road, a popular online black market where everything from ammunition, drugs to assassins could be bought.

The concept behind the Deep Web is not as dark as it seems. The reason is simple – Google, Bing and all search engines use crawlers to traverse the web. They follow the links from one page to another and are able to collate all the static pages. The pages which are generated directly in response to some stimuli are not captured. Around 54% of the websites are databases and thus not captured.

There are other pages which are available only on the intranet/private networks and thus not captured.

Then there is a hidden part of the web called Tor, that requires specialised software to access it. It is used so their web activity cannot be traced. It runs on a relay system that bounces signals among different Tor-enabled computers around the world.

Well that’s about Deep Web. Lets look at the importance of Deep Web.

  • A search engine that can crawl the entire Web can be used for Big Data analysis providing more accurate information on climate, finances etc.
  • The deep web contains 550 billion documents compared to one billion on the surface web.
  • Deep web contents is highly relevant to every information, need and market
  • 95% of the content on deep web is freely accessible information, not subject to fees or subscription

Companies are doing their best to mine into this treasure trove of information and coming up with new methods of search for this.


Related Readings