Posts Tagged ‘Outwit Hub’

A New Kernel

Monday, November 16th, 2009

We have been extremely busy in the last weeks with the complete refactoring of the OutWit Kernel, preparing the way for the advanced automation functionalities of OutWit Hub Pro. The coming version 0.8.9 will be the first using our new core library. You will not see very radical changes yet, except for the scraper editor, which should make many of you happy. Here are the changes that you will find:

The brand new scraper manager and editor

The big red Stop button that many have been asking for (which, by the way, also allows to abort ‘Apply Scraper to URLs’ processes)

A few changes in the interface, to prepare the integration of new automators in the following versions.

(more…)

How to Extract Links from a Web Page

Thursday, October 8th, 2009

In this tutorial we are going to learn how to extract links from a webpage with OutWit Hub.

Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.
You should run these to discover the Hub.

Sometimes it can be useful to extract all links from a given web page. OutWit Hub is the easiest way to achieve this goal.

(more…)

Grab Documents With OutWit Hub

Thursday, September 17th, 2009

In this tutorial we are going to learn how to download all the documents (.pdf, .doc, .xls,…) from a webpage with OutWit Hub.

Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.
You should run these to discover the Hub.

On some webpages, you can find links to different kinds of documents. Looking for each link would be really tiring: with OutWit Hub, you can automatically see all the links to documents, the name and extension of those, and download them to your hard-disk (also see OutWit Docs).

(more…)

How to scrape Search Engine Result Pages with OutWit Hub for SEO Audit (Video)

Thursday, September 17th, 2009

Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.
You should run these to discover the Hub.

In this tutorial, Dale Stokdyk, explains how to scrape Search Engine Result Pages (SERPS) with OutWit Hub Data Extractor for Firefox. OutWit Hub is very useful when you are performing an SEO Audit.

Step 1 : Create a web scraper for  Search Engine Result Pages

Step 2 : Scrape Search Engine Result Pages

Step 3 : Export data to Excel or Sql

Read the full tutorial on Marketing2OH

General overview of the OutWit programs

Tuesday, July 28th, 2009

OutWit’s collection technology is organized around three simple concepts:

  1. The programs dissect the Web page into data elements and enable users to see only the type of data they are looking for (images, links, email addresses, RSS news…).
  2. They offer a universal collection basket, the « Catch », into which users can manually drag and drop or automatically collect structured or unstructured data, links or media, as they surf the Web.
  3. They also know how to automatically browse through series of pages, allowing users to harvest all sorts of information objects in a single click.

With simple intuitive features as well as sophisticated scraping functions and data structure recognition, the OutWit programs target a broad range of user categories.

(more…)

Deep Web searches

Tuesday, June 23rd, 2009

In our work for the coming versions of the kernel, one of our main chalenges is OutWit’s ability to explore the hidden Web. We are working on some very exciting features in this area (partly autonomous, partly user-driven explorations). The Deep Web is composed of Web pages and resources that are not indexed by  search engines, simply because there are no links to them. One of the interesting functions we are working on is the generation of URLs and queries to the dark side of the Web.

(more…)

FAQs for OutWit Hub, Sourcer, Images and Docs for Firefox

Tuesday, December 9th, 2008

IMPORTANT! – BEFORE READING ON: 

  • The main FAQ is in the Help within OutWit Hub and Email Sourcer applications (in the top menu: Help>Frequently Asked Questions). It is much more up-to-date and targeted for Hub/Sourcer users. Please refer to it rather than this static page.
  • Make sure you have the latest version, especially if you just downloaded your program from a third-party site.

The latest version of OutWit Hub is on the download page of outwit.com and the version history is here. For Email Sourcer, the links are respectively download and history.

Compatible Versions

General Questions

Technical Questions

(more…)

Creating a Scraper for Multiple URLs Using Regular Expressions

Wednesday, November 19th, 2008

Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.
You should run these to discover the Hub.

NOTE: This tutorial was created using version 0.8.2. The Scraper Editor interface has changed a long time ago. More features were included and some controls now have a new name. The following can still be a good complement to get acquainted with scrapers. The Sraper Editor can now be found in the ‘Scrapers’ view instead of ‘Source’ but the principle remains funamentally the same.

In this example we’ll redo the scraper from the previous lesson using Regular Expressions.  This will allow us to create a more precise scraper, which we can then apply to many URLs.  When working with RegExps you can always reference a list of basic expressions and a tutorial by selecting ‘Help’ in the menu bar.

Recap: For complex web pages or specific needs, when the automatic data extraction functions (table, list, guess) don’t provide you with exactly what you are looking for, you can extract data manually by creating your own scraper.  Scrapers will be saved on your computer then can be reapplied or shared with other users, as desired.

(more…)

Create your First Web Scraper to Extract Data from a Web Page

Friday, August 22nd, 2008

Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.
You should run these to discover the Hub.

Find a simple but more up-to-date version of this tutorial here

This tutorial was created using version 0.8.2. The Scraper Editor interface has changed a long time ago. Many more features were included and some controls now have a new name. The following can still be a good complement to get acquainted with scrapers. The Sraper Editor can now be found in the ‘Scrapers’ view instead of ‘Source’ but the principle remains funamentally the same.

In many cases the automatic data extraction functions: tables, lists, guess, will be enough and you will manage to extract and export the data in just a few clicks.

If, however, the page is too complex, or if your needs are more specific there is a way to extract data manually: Create your own scraper.

Scrapers will be saved to your personal database and you will be able to re-apply them on the same URL or on other URLs starting, for instance, with the same domain name.

A scraper can even be applied to whole lists of URLs.

You can also export your scrapers and share them with other users.

Let’s get acquainted with this feature by creating a simple one.

(more…)

OutWit Hub Version 0.8.0.34 for Firefox 3 is online

Friday, August 1st, 2008

The Firefox 3 version was released yesterday. It can be downloaded here.

It includes a number of fixes and enhancements: Drag and drop of text, links and images to the catch, autocompletion in address bar, enhanced application of scrapers, better tree behavior, enhanced image extraction process (more high resolution images are found by the ‘images’ widget and the slideshow), enhanced navigation link recognition…