Archive for August, 2008

OutWit Hub and OutWit Kernel 0.8.1.83 Released

Sunday, August 31st, 2008

This update includes a complete refactoring of the Kernel and a new user interface where tabs are replaced by a hierarchical list of views.

New side panel

New side panel

The new improvements of this version are listed here.

Create your First Web Scraper to Extract Data from a Web Page

Friday, August 22nd, 2008

Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.
You should run these to discover the Hub.

Find a simple but more up-to-date version of this tutorial here

This tutorial was created using version 0.8.2. The Scraper Editor interface has changed a long time ago. Many more features were included and some controls now have a new name. The following can still be a good complement to get acquainted with scrapers. The Sraper Editor can now be found in the ‘Scrapers’ view instead of ‘Source’ but the principle remains funamentally the same.

In many cases the automatic data extraction functions: tables, lists, guess, will be enough and you will manage to extract and export the data in just a few clicks.

If, however, the page is too complex, or if your needs are more specific there is a way to extract data manually: Create your own scraper.

Scrapers will be saved to your personal database and you will be able to re-apply them on the same URL or on other URLs starting, for instance, with the same domain name.

A scraper can even be applied to whole lists of URLs.

You can also export your scrapers and share them with other users.

Let’s get acquainted with this feature by creating a simple one.

(more…)

OutWit Hub Version 0.8.0.34 for Firefox 3 is online

Friday, August 1st, 2008

The Firefox 3 version was released yesterday. It can be downloaded here.

It includes a number of fixes and enhancements: Drag and drop of text, links and images to the catch, autocompletion in address bar, enhanced application of scrapers, better tree behavior, enhanced image extraction process (more high resolution images are found by the ‘images’ widget and the slideshow), enhanced navigation link recognition…