« What I want to get out of BIT330 | Main | RSS: Everyone likes to be Fed »
September 16, 2007
Examination of the Deep Web
My Adventures in the Deep (Blue) Web
Wha???
Everyone knows Google. Everyone knows how to search Google (basically). Everyone expects Google to give them everything they need to know. What everyone doesn't know is the vast depth of information which the surface internet is unable to capture; information that is hidden behind query & databases within and even behind Google's vast information highway. Welcome to the Deep Web! A place where there is more information that you even knew existed (figuratively and, ironically, literally as well). My goal today is to examine this area of the deep by doing a few relatively simple searches and perhaps learn a little as well.
Timber Industry California & Equity Research
To dive into this information, I will be conducting two searches (Timber Industry California, Equity Research) on multiple websites (Google, Yahoo Directory, at Yahoo Web, Scirus, Google Scholar,UM Library’s Search Tools, & CompletePlanet). So....lets see how things turned out:
Timber Industry California
Complete Planet
Equity Research
As you go through the list of searches (top to bottom) one thing I noticed was a common trend of specific search ability. Google and Yahoo web searches tended to pump out businesses and websites where as the later searches resulted in academic papers, articles, and pdf files. Therefore, the first thing I found interesting through my observations was the inability of top level search engines to easily grasp & organize specific information from other databases .
So, the latter search engines must be better for academic purposes, right? Well, as is the answer for most of life...it depends. Yes the information retrieved was far more specific and "deep" in nature, however the number of responses was also limited. This leads me to the second interesting thing I observed; the top level search engines, although less specific, are still far favorable in theoretical breadth. What I mean by this is as follows: after reading the He, Patel, Zhang, & Chang article on "Accessing the Deep Web" it was written that research suggests Google and yahoo web queries have been able to find approximately 32% of the Deep Web. So, percentage wise, if each of them gives 2 Million responses, about 640,000 of those revolve around Deep Web Information (You just have to find it). For the same search, the other search engines were providing about .5% of the responses (around 8-10k, but as low as 2k).
What does this mean? Mitigation, Mitigation, Mitigation. My third and final point is when searching, those looking for information should use a mitigation strategy to obtain as many facets of the information as possible. The Deep Web can be easily viewed by search engines such as Scirus & Complete Planet. Take advantage of this knowledge, but don't forget about the breadth of deep web from which the mainstream search engines can provide. If you are able to maximize the depth and breadth of your searching of the deep web, you will find yourself successful.
Posted by grantrob at September 16, 2007 11:58 AM
