Invention Machine Blog

describe the image

Subscribe via E-mail

Your email:

Learn More

Take a Tour of Goldfire

Read the Case Studies

Request a Goldfire demo

Follow Us

describe the image  describe the image  describe the image  describe the image

Current Articles | RSS Feed RSS Feed

INFOGRAPHIC: Exploring the Deep Web with Semantic Search

  
  
  

The Web is fast becoming a titanic, complex entity. By the year 2015, it’s estimated that one zettabyte of content will be added to the web each and every year. Navigating this sea of information presents more and more of a challenge -- particularly when much of that content is not easily accessed by traditional search engines.

The Surface Web
When most of us think of the Web, we think of the 'Surface Web', also known as the visible web - the webpages we access directly, via links or via common search engines like Google. However, the Surface Web makes up just 4 percent of all the content on the Internet.

The Deep Web
The ‘Deep Web’ or ‘Invisible Web’ is several orders of magnitude larger than the Surface Web and represents a staggering 96 percent of information on the Web. This content includes:

  • Dynamic or scripted content
  • Unlinked content - pages which are not linked to by other pages, which may prevent web crawling pprograms from accessing the content.
  • Private or password-proected websites
  • Webpages with content varying for different access contexts (e.g., ranges of client IP addresses or previous navigation sequence).
  • Limited access content - sites that limit access to their pages in a technical way
  • Non-HTML/text content - textual content encoded in multimedia (image or video) files or specific file formats not handled by search engines.

This content can only be mined and leveraged using sophisticated search technologies, such as Goldfire's world-class semantic search.

If you aren't searching the Deep Web, check out what you might be missing:

Deep Web

Tags: 

Comments

On one hand, how do you mine pages which are password protected? 
 
Given that these are excluded from the a semantic search, could this be used to find our way around web spam? If these sites are difficult to find, the chances are they are not commercial / advertising sites masking as real information sites. 
 
This is one are I plan to look in to.
Posted @ Thursday, January 17, 2013 8:36 AM by londonskeptic.org.uk
 
 
Hű, ez elég érdekes. Inspiráló, is. Köszönjük, hogy megosztotta az ilyen inspiráló élményt velünk. Igaza van, ha tényleg életeket menthet. Nagy blog, gratula.
Posted @ Tuesday, February 19, 2013 12:53 AM by maillot de foot france
El artículo vale la pena leer, ¡Muchas gracias! Voy a seguir sus nuevos artículos.
Posted @ Tuesday, March 05, 2013 12:52 AM by tee shirt femme pas cher
Nzuri sana blog, alikuwa na furaha na kujifunza mengi hapa, I hope unaweza mara kwa mara kuangalia, asante sana! 
Si muda mrefu kutambuliwa kuhusu tovuti yako na bado tayari kusoma pamoja. Mimi kudhani i itakuwa na uwezo wa kuondoka wangu 1 comment. i wala kuthibitisha kile isipokuwa kusema kwamba nimepata walifurahia kusoma. Nice blog. vibaya kuwa Bookmarking kushika kutembelea tovuti hili sana kawaida.
Posted @ Wednesday, April 03, 2013 4:11 AM by chaussures asics
It’s in fact a great and helpful piece of info. I’m really pleased that you collective this helpful information with us.
Posted @ Saturday, April 20, 2013 10:15 AM by hr dissertation proposal topics
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics