The Deep Web: Semantic Search Takes Innovation to New Depths
The Web is fast becoming a titanic, complex entity. By the year 2015, it’s estimated that one zettabyte of content will be added to the web each and every year. Navigating this sea of information presents more and more of a challenge -- particularly when much of that content is not easily accessed by traditional search engines.
The Deep Web
When most of us think of the Web, we think of the webpages – from online retailers, to government or organization-sponsored sites, to social media sites, news sites and more - sites we access directly, via links or via common search engines like Google. However, the scores of non-textual files (such as videos and images) and content stored in tables or databases far exceeds the ‘searchable’ content.
The ‘Surface Web’, also known as the ‘Visible’ or ‘Searchable’ web, while significantly large at over 8 billion pages, only cracks the surface when it comes to the size of the Internet. The ‘Deep Web’, (also known as the ‘Invisible Web’, ‘Deepnet’, ‘DarkNet’, ‘Hidden Web’ and ‘Undernet’) refers to the content on the Internet that is not capable of being indexed by standard search engines, leaving the content ‘hidden’. The Deep Web houses over 96% of the content on the web that is publicly accessible.
According to CompletePlanet:
- Public information on the Deep Web is currently 400 to 550 times larger than the commonly defined World Wide Web.
- The Deep Web contains 7,500 terabytes of information, compared to 19 terabytes of information in the surface Web.
- The Deep Web contains nearly 550 billion individual documents compared to the 1 billion of the surface Web.
- More than an estimated 200,000 Deep Web sites presently exist. Sixty of the largest Deep Web sites collectively contain about 750 terabytes of information – sufficient by themselves to exceed the size of the surface Web by 40 times.
For content to be included as part of the surface web, web crawlers need to be able to find the content, which is done most commonly through links. The Deep Web, which contains some of the richest technical content on the Internet, therefore consists of items such as dynamic URLs, form-controlled entry pages, password-protected access pages, hidden pages, geo-tagged pages, content that is too new to have been indexed, and directories crawlers are told to exclude via robot exclusion files.
Examples of the types of content stored in directories and databases that make up the Deep Web include: patents, laws, ‘people finders’ such as lists of professionals like engineers and doctors, online catalogs, web stores, digital exhibits, multimedia and graphical files and more. Typically organized around a particular field, the content tends to be very rich in engineering, scientific, technical, or domain specific knowledge generated over the years by specialized practitioners.
Tapping the Deep Web to Fuel Innovation
Imagine the type of knowledge and resources that could be leveraged if researchers were able to harness the information held within the remaining 96% of the Internet? The ability for access to this type of content is today one of the most important sources of competitive differentiation and advantage for companies.
Invention Machine is a leader in semantic research technology that unlocks decisions in data. Our patented semantic question-answering engine helps companies accelerate innovation, increase productivity and deliver superior products and services.
Invention Machine’s innovation intelligence platform, Goldfire, is powered by our world-class semantic question-answering technology and proven innovation tools and methods.
With Goldfire, companies access over 3,300 of the richest Deep Web sites containing scientific and technical information from government, academic, commercial, and professional databases that cannot be accessed by conventional web searches. Also included in Goldfire is a semantic index of more than 5.6 million documents from over 1,750 of the best Deep Web sites and a special utility providing access methods to other Deep Web sites.
Goldfire’s patented semantic research capabilities understands the questions being asked, delivering relevant answers instead of simply producing the collection of keyword related hyperlinks of the typical search engine. Goldfire also has multi-lingual capabilities in English, French, German, and Japanese enabling researchers to retrieve answers in their native language despite the content being authored in languages they cannot understand.
When the power of Goldfire’s semantics and Deep Web searching are combined, it allows users to make sense of all of this unstructured, previously inaccessible information, providing precise answers to even the most challenging research questions, allowing workers to infuse knowledge into their research processes -- saving valuable time and money and increasing productivity.
Interested in learning more about Goldfire’s powerful Deep Web and semantic capabilities? Watch this overview video on semantics or see if Goldfire is right for you.