Author Archive

General overview of the OutWit programs

Tuesday, July 28th, 2009

OutWit’s collection technology is organized around three simple concepts:

  1. The programs dissect the Web page into data elements and enable users to see only the type of data they are looking for (images, links, email addresses, RSS news…).
  2. They offer a universal collection basket, the « Catch », into which users can manually drag and drop or automatically collect structured or unstructured data, links or media, as they surf the Web.
  3. They also know how to automatically browse through series of pages, allowing users to harvest all sorts of information objects in a single click.

With simple intuitive features as well as sophisticated scraping functions and data structure recognition, the OutWit programs target a broad range of user categories.

(more…)

Deep Web searches

Tuesday, June 23rd, 2009

In our work for the coming versions of the kernel, one of our main chalenges is OutWit’s ability to explore the hidden Web. We are working on some very exciting features in this area (partly autonomous, partly user-driven explorations). The Deep Web is composed of Web pages and resources that are not indexed by  search engines, simply because there are no links to them. One of the interesting functions we are working on is the generation of URLs and queries to the dark side of the Web.

(more…)

OutWit Docs beta was released

Sunday, March 29th, 2009

We released the first public version of OutWit Docs during the weekend as well as updated versions of OutWit Images and Outwit Hub.

OutWit Docs is a simple WebTop Document Finder, based on our Kernel. It allows you to search through Websites and search engines for documents and it will present the results as an operating system would, either in icon view or as a list of files.

oW Docs looks for text files, spreadsheets, presentations in various formats (including PDF, MS Office, OpenOffice documents, RTF, CSV…).

In this version, the filtering & automatic selection options are somewhat basic (name, file type…), but we are going to improve these along the way. As we cannot download all the result files to explore their contents, we are working on a multi-layered filtering process to refine the query, refine the selection and search the content of the most pertinent files only.

As for all our products, your suggestions will be extremely welcome. In the meantime, we hope that you’ll enjoy this program.

FAQs for OutWit Hub, Sourcer, Images and Docs for Firefox

Tuesday, December 9th, 2008

IMPORTANT! – BEFORE READING ON: 

  • The main FAQ is in the Help within OutWit Hub and Email Sourcer applications (in the top menu: Help>Frequently Asked Questions). It is much more up-to-date and targeted for Hub/Sourcer users. Please refer to it rather than this static page.
  • Make sure you have the latest version, especially if you just downloaded your program from a third-party site.

The latest version of OutWit Hub is on the download page of outwit.com and the version history is here. For Email Sourcer, the links are respectively download and history.

Compatible Versions

General Questions

Technical Questions

(more…)

OutWit Hub’s New Features

Tuesday, November 25th, 2008

Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.
You should run these to discover the Hub.

The newest update of the Hub contains some exciting new features. This tutorial will explain these new functionalities but for a more detailed explanation of how to use the Hub’s basic features please, refer to the existing list of tutorials.

(more…)

Creating a Scraper for Multiple URLs Using Regular Expressions

Wednesday, November 19th, 2008

Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.
You should run these to discover the Hub.

NOTE: This tutorial was created using version 0.8.2. The Scraper Editor interface has changed a long time ago. More features were included and some controls now have a new name. The following can still be a good complement to get acquainted with scrapers. The Sraper Editor can now be found in the ‘Scrapers’ view instead of ‘Source’ but the principle remains funamentally the same.

In this example we’ll redo the scraper from the previous lesson using Regular Expressions.  This will allow us to create a more precise scraper, which we can then apply to many URLs.  When working with RegExps you can always reference a list of basic expressions and a tutorial by selecting ‘Help’ in the menu bar.

Recap: For complex web pages or specific needs, when the automatic data extraction functions (table, list, guess) don’t provide you with exactly what you are looking for, you can extract data manually by creating your own scraper.  Scrapers will be saved on your computer then can be reapplied or shared with other users, as desired.

(more…)

Creating a Scraper for Multiple URLs, Simple Method

Tuesday, November 18th, 2008

Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.
You should run these to discover the Hub.

This tutorial was created using version 0.8.2. The Scraper Editor interface has changed a long time ago. Many more features were included and some controls now have a new name. The following can still be a good complement to get acquainted with scrapers. The Sraper Editor can now be found in the ‘Scrapers’ view instead of ‘Source’ but the principle remains funamentally the same.

Now that we’ve learned how to create a scraper for a single URL, let’s try something a little more advanced.  In this lesson we’ll learn how to create a scraper which can be applied to a whole list of URLs using a simple method suited for beginners.  In the next lesson a more complex scraper utilizing regular expressions will be demonstrated for our tech savvy users.  Geeks, feel free to skip to: Creating a Scraper for Multiple URLs using Regular Expressions.

Recap: For complex web pages or specific needs, when the automatic data extraction functions (table, list, guess) don’t provide you with exactly what you are looking for, you can extract data manually by creating your own scraper.  Scrapers will be saved on your computer then can be reapplied or shared with other users, as desired.

(more…)

OutWit Images Was Released

Monday, November 17th, 2008

The first beta version of OutWit Images was posted on Firefox Add-ons and came out of the experimental zone this weekend. This new outfit is an online image browser that not only allows you to view Web images as a slideshow or as a wall of thumbnails but also to grab the pictures and save them to your hard disk.

The feedback we are already receiving for this extension is extremely encouraging: with Images like with the Hub, people are actually managing to do things they simply couldn’t do before. And this really makes us happy…

Download OutWit Images

Version 0.8.1.126 is preparing the way for OutWit Images

Friday, October 24th, 2008

Version 0.8.1.126 was released yesterday. This update adds several features to the Kernel for the forthcoming release of OutWit Images, improving in particular the image extraction process and the slideshow.  The version also includes, among other new features, enhanced bottom panels, with a series of additional criteria to refine your selections and filter the extracted data.

OutWit Hub out of the sandbox on Mozilla Addons

Monday, September 22nd, 2008

The Hub finally came out of the Experimental section of Mozilla Addons, after the review was kindly done by Brian King.