OutWit Hub Version 2.1 is available

by - February 1st, 2012

We have solved a few remaining glitches and released version 2.1.1.5 this morning. You may see an alert telling you that the previous version has expired. You just need to download the most recent from outwit.com. (Sorry for not making this one automatic, but it will be from now on.)

Sorry for the alert

by - January 31st, 2012

Please excuse the expiration alert you are getting when running the Hub between versions 2.1.0.x and 2.1.1.3. If you are using the firefox add-on, you can update now. For the standalone applications, you will need to download the latest version. Your profile and automators will not be affected.

OutWit Hub 2.0 is online!

by - November 9th, 2011

We have released version 2.0 of OutWit Hub with dozens of new features, dramatically enhanced scrapers and a fun built-in tutorial system. This is a major upgrade both for casual users and data addicts. It can handle very large volumes of data and offers more, simpler ways to build scrapers and apply them discriminately during automatic Web explorations.
I hope you’ll like it. We do.

Version for Aurora (FF7) is ready for download on outwit.com

by - August 13th, 2011

We just uploaded version 1.0.7.23 of OutWit Hub and v.0.3.7.23 of Images and Docs. This is a minor update with a few fixes in scrapers and next page recognition but it does include the latest kernel update for those of you who are using Firefox Aurora.

Beta testing version 1.1

by - July 6th, 2011

We have started limited beta testing of the next version. As it brings enhancements to pro-only features, you need to own a license for OutWit Hub Pro if you wish to test this pre-version. Drop us a line here to receive the link.

Version 1.0.7.10

by - July 6th, 2011

The latest version of OutWit extensions is now compatible with versions 3.6 to 6 of Firefox. You can download it directly from outwit.com. In order to add the new features, we had to discontinue support for Firefox v3.5, as well as for versions of Mac OS before 10.6.

Very large data sets

by - June 13th, 2011

We will soon release v.1.1 of OutWit Hub. This major update includes a complete refactoring of whole parts of the application, particularly for managing and exporting very large datasheets. There are still volume limitations, of course, but they were pushed by a factor of more than 10 in this new version. Depending on the number of columns, the program will now handle extractions of 100,000+ rows of data. (In fact, in the latest tests, we successfully extracted 1.15 million rows straight to the catch, then exported and loaded 1,048,576 of these in Excel, as it seems to be the maximum Excel can handle.)

Version 1.0.5.8 for Firefox 5b3

by - June 9th, 2011

Version 1.0.5.8 is online for download on outwit.com with a series of fixes. This version works with the latest beta of Firefox 5.

New versions, automatic updates and major upgrades

by - May 11th, 2011

We have received feedback from users who found that the automatic update frequency was too high. In order to be as reactive as possible to the requests or bug reports we receive, we must usually produce between two and five minor updates each month. This would obviously be too high a frequency. We are now trying to limit the periodicity of general, automatic updates to one every 30 to 45 days. In the meantime, if you wish to follow the versions more closely, you can look at the version history page on outwit.com and download the latest manually. Your outwit configuration will not be altered when you install a new version.

OutWit Hub Pro Users: we release major upgrades several times a year, like the upcoming version 1.1. If you have purchased the pro version of OutWit Hub 1.0 and if your free upgrade plan has not expired, the new version will be free for you and you will simply have to reenter your serial number to unlock it.

Advanced Tips: Hierarchical Scraping

by - April 22nd, 2011

You may need, at times, to extract hierarchical data without loosing the structure:

  • Flight #AW345: New York - Paris:
    • Departure time: 04:45 pm (local)
    • Arrival time: 07:15 am (local)
  • Flight #SG45: Paris - Rome:
    • Departure time: 10:05 am (local)
    • Arrival time: 11:55 am (local)
  • Flight #SG46: Rome - Paris:
    • Departure time: 06:25 am (local)
    • Arrival time: 08:55 am (local)
  • Flight #AW346: Paris - New York:
    • Departure time: 09:20 am (local)
    • Arrival time: 08:35 pm (local)

In many of these cases, you will lack significant markers to distinguish the parent elements (the legs of the trip, in this case). If you just grab the time information with a simple scraper, you are likely to loose the leg they belong to. In cases like this (although we haven’t yet implemented a complete recursive/hierarchical scraping system), you can already often get the result you want using the ‘Separator’ & ‘Labels’ fields in the scraper (pro version only). Making such scrapers often requires a good understanding of complex regular expressions.

The Separator is a delimiter which you can use to split an extracted string into several data fields.
The List of Labels is the series of headers to be used respectively for each destination column, when the result is split into several fields using a Separator.

In the Separator field, you can use either a literal string like “,” or “;”, a tag like “</ul>” or a regular expression. When splitting the result with a Separator, use the List of Labels to assign a field name to each part of the data. Separate the labels with a comma.

The parent block of data is extracted between the Marker Before and the Marker After and this block is then split into several fields. This way, you will keep the data that belong together in a same row. Separator/Labels are very helpful, in general, when the strings you want to extract are not surrounded by remarkable markers.