Major updates
by pr - August 2nd, 2010Some important fixes were done, please update to OutWit Hub v0.9.3.8 and OutWit Images v0.2.3.8.
Some important fixes were done, please update to OutWit Hub v0.9.3.8 and OutWit Images v0.2.3.8.
Thank you for your abundant feedback. We have put version 0.8.9.132 online with a few new functions as well as a list of features and fixes recently requested. We didn’t manage to exactly synchronize this update with the release of FF 3.6. but we have now corrected most glitches in the last 48 hours. We may have a to release updates a little more frequently during this month as we will try to go down the beta-test feedback and wish list as rapidly as possible in the coming weeks.
Thank you in advance for your patience.
JC
… We will try to help with our programs and make your life easier.
The new Kernel has been online for a few weeks now and we seem to have fixed all the regressions (not so many, in fact, after such fundamental architectural changes). I believe we can now advise those who haven’t done it yet, to install the updates as the general feedback is pretty positive.
You will, however, find very few features of the upcoming OutWit Hub Pro in the 0.8.9.x versions. These features will only be included progressively in the 0.9.x updates, as the very last beta versions before we release v 1.0. Your feedback will of course be very much appreciated.
We will be glad to propose the Pro version for half the price to all those (officially beta testers or not) who will have helped us identify or fix bugs or who have suggested new features which have been implemented or included in our to do list. So, please, do not hesitate to register on outwit.com and share your comments on the program.
Note about the use of your email address: We have never sent our registered users a single e-mail or newsletter up to now. We really dislike invasive mass e-mailings and the least we can do is respect our own principles. So you can be assured that your privacy is safe with us. A few weeks before the release of the Hub 1.0, we will nevertheless propose the update to our beta testers and users, as well as a feedback form for those who have a little time. We promise that this will be very exceptional.
Cheers to all.
We have been extremely busy in the last weeks with the complete refactoring of the OutWit Kernel, preparing the way for the advanced automation functionalities of OutWit Hub Pro. The coming version 0.8.9 will be the first using our new core library. You will not see very radical changes yet, except for the scraper editor, which should make many of you happy. Here are the changes that you will find:
- The brand new scraper manager and editor
- The big red Stop button that many have been asking for (which, by the way, also allows to abort ‘Apply Scraper to URLs’ processes)
- A few changes in the interface, to prepare the integration of new automators in the following versions.
In this tutorial we are going to learn how to extract links from a webpage with OutWit Hub.
Sometimes it can be useful to extract all links from a given web page. OutWit Hub is the easiest way to achieve this goal.
In this tutorial we are going to learn how to download all the documents (.pdf, .doc, .xls,…) from a webpage with OutWit Hub.
On some webpages, you can find links to different kinds of documents. Looking for each link would be really tiring: with OutWit Hub, you can automatically see all the links to documents, the name and extension of those, and download them to your hard-disk (also see OutWit Docs).
In this tutorial, Dale Stokdyk, explains how to scrape Search Engine Result Pages (SERPS) with OutWit Hub Data Extractor for Firefox. OutWit Hub is very useful when you are performing an SEO Audit.
Step 1 : Create a web scraper for Search Engine Result Pages
Step 2 : Scrape Search Engine Result Pages
Step 3 : Export data to Excel or Sql
I understand it is mean to talk about features that are not implemented in the downloadable versions, but I would like to share my ideas on the purpose behind our experimental semantic features.
The “mechanical” recognition and extraction algorithms used in most views of the Hub are mostly based on a combination of DOM analysis (when dealing with HTML pages) and morphological recognition of objects and strings. These techniques are very efficient for simple scraping of data, but they are not sufficient when we need to discriminately extract data about certain themes or topics. We are currently adding semantic capacities to our extractors (in professional applications only, for now).
At the moment, we are only focusing on statistical analysis of the words and phrases, without performing any syntactic analysis of the texts. However, the results are very promising and seem to confirm our original ideas.
At OutWit, we are working on adding intelligence to the Web browser.
The free beta applications that you have been downloading from our site are only parts of what we are developing. They are implementations of some of the recognition and extraction capacities that we are including in the OutWit Kernel. We have been talking about a public API for more than a year now and, although it is definitely still in the pipe, we have been delaying it (as for the complete help and documentation) until we can reach a stable enough version of the kernel and feel confortable with people starting to write code around it.
We are convinced that the future will prove it was a good idea to add semantic intelligence to the browser itself instead of exclusively focusing on the server side.
OutWit’s collection technology is organized around three simple concepts:
With simple intuitive features as well as sophisticated scraping functions and data structure recognition, the OutWit programs target a broad range of user categories.