Deep Web searches
In our work for the coming versions of the kernel, one of our main chalenges is OutWit’s ability to explore the hidden Web. We are working on some very exciting features in this area (partly autonomous, partly user-driven explorations). The Deep Web is composed of Web pages and resources that are not indexed by search engines, simply because there are no links to them. One of the interesting functions we are working on is the generation of URLs and queries to the dark side of the Web.
We put a very basic implementation of this principle in version 0.8.3.46. A new checkbox in the bottom panel of the ‘Images’ view allows to ask the program to look for possible neighboring images in series on the same server. In the current version, the program only searches for directly adjacent images in sequences. We are just including this as a test and it will probably not remain (at least in the same form) in the following versions. We would be glad to receive your feedback and suggestions on the principle.
[updated on July 25, 2010]
A more advanced implementation is now used in several parts of OutWit Hub Pro. It is documented in the online help center accessible from the application. Here is a direct access to the page about Query Generation Matrices in OutWit Hub.
by jcc
Tags: deep web, grab images, image extraction, Outwit Hub, OutWit Images, query generation matrices
March 18th, 2010 at 11:46 pm
This is an excellent feature, would like to see something similar applied to docs (all pdfs in a virtual directory or something)