Thank you for your abundant feedback. We have put version 0.8.9.132 online with a few new functions as well as a list of features and fixes recently requested. We didn’t manage to exactly synchronize this update with the release of FF 3.6. but we have now corrected most glitches in the last 48 hours. We may have a to release updates a little more frequently during this month as we will try to go down the beta-test feedback and wish list as rapidly as possible in the coming weeks.
… We will try to help with our programs and make your life easier.
The new Kernel has been online for a few weeks now and we seem to have fixed all the regressions (not so many, in fact, after such fundamental architectural changes). I believe we can now advise those who haven’t done it yet, to install the updates as the general feedback is pretty positive.
You will, however, find very few features of the upcoming OutWit Hub Pro in the 0.8.9.x versions. These features will only be included progressively in the 0.9.x updates, as the very last beta versions before we release v 1.0. Your feedback will of course be very much appreciated.
We will be glad to propose the Pro version for half the price to all those (officially beta testers or not) who will have helped us identify or fix bugs or who have suggested new features which have been implemented or included in our to do list. So, please, do not hesitate to register on outwit.com and share your comments on the program.
Note about the use of your email address: We have never sent our registered users a single e-mail or newsletter up to now. We really dislike invasive mass e-mailings and the least we can do is respect our own principles. So you can be assured that your privacy is safe with us. A few weeks before the release of the Hub 1.0, we will nevertheless propose the update to our beta testers and users, as well as a feedback form for those who have a little time. We promise that this will be very exceptional.
We have been extremely busy in the last weeks with the complete refactoring of the OutWit Kernel, preparing the way for the advanced automation functionalities of OutWit Hub Pro. The coming version 0.8.9 will be the first using our new core library. You will not see very radical changes yet, except for the scraper editor, which should make many of you happy. Here are the changes that you will find:
-The brand new scraper manager and editor
-The big red Stop button that many have been asking for (which, by the way, also allows to abort ‘Apply Scraper to URLs’ processes)
-A few changes in the interface, to prepare the integration of new automators in the following versions.
In this tutorial we are going to learn how to download all the documents (.pdf, .doc, .xls,…) from a webpage with OutWit Hub.
On some webpages, you can find links to different kinds of documents. Looking for each link would be really tiring: with OutWit Hub, you can automatically see all the links to documents, the name and extension of those, and download them to your hard-disk (also see OutWit Docs).
In this tutorial, Dale Stokdyk, explains how to scrape Search Engine Result Pages (SERPS) with OutWit Hub Data Extractor for Firefox. OutWit Hub is very useful when you are performing an SEO Audit.
Step 1 : Create a web scraper for Search Engine Result Pages
I understand it is mean to talk about features that are not implemented in the downloadable versions, but I would like to share my ideas on the purpose behind our experimental semantic features.
The “mechanical” recognition and extraction algorithms used in most views of the Hub are mostly based on a combination of DOM analysis (when dealing with HTML pages) and morphological recognition of objects and strings. These techniques are very efficient for simple scraping of data, but they are not sufficient when we need to discriminately extract data about certain themes or topics. We are currently adding semantic capacities to our extractors (in professional applications only, for now).
At the moment, we are only focusing on statistical analysis of the words and phrases, without performing any syntactic analysis of the texts. However, the results are very promising and seem to confirm our original ideas.
At OutWit, we are working on adding intelligence to the Web browser.
The free beta applications that you have been downloading from our site are only parts of what we are developing. They are implementations of some of the recognition and extraction capacities that we are including in the OutWit Kernel. We have been talking about a public API for more than a year now and, although it is definitely still in the pipe, we have been delaying it (as for the complete help and documentation) until we can reach a stable enough version of the kernel and feel confortable with people starting to write code around it.
We are convinced that the future will prove it was a good idea to add semantic intelligence to the browser itself instead of exclusively focusing on the server side.
OutWit’s collection technology is organized around three simple concepts:
The programs dissect the Web page into data elements and enable users to see only the type of data they are looking for (images, links, email addresses, RSS news…).
They offer a universal collection basket, the « Catch », into which users can manually drag and drop or automatically collect structured or unstructured data, links or media, as they surf the Web.
They also know how to automatically browse through series of pages, allowing users to harvest all sorts of information objects in a single click.
With simple intuitive features as well as sophisticated scraping functions and data structure recognition, the OutWit programs target a broad range of user categories.
In our work for the coming versions of the kernel, one of our main chalenges is OutWit’s ability to explore the hidden Web. We are working on some very exciting features in this area (partly autonomous, partly user-driven explorations). The Deep Web is composed of Web pages and resources that are not indexed by search engines, simply because there are no links to them. One of the interesting functions we are working on is the generation of URLs and queries to the dark side of the Web.