by jcc - August 25th, 2009
I understand it is mean to talk about features that are not implemented in the downloadable versions, but I would like to share my ideas on the purpose behind our experimental semantic features.
The “mechanical” recognition and extraction algorithms used in most views of the Hub are mostly based on a combination of DOM analysis (when dealing with HTML pages) and morphological recognition of objects and strings. These techniques are very efficient for simple scraping of data, but they are not sufficient when we need to discriminately extract data about certain themes or topics. We are currently adding semantic capacities to our extractors (in professional applications only, for now).
At the moment, we are only focusing on statistical analysis of the words and phrases, without performing any syntactic analysis of the texts. However, the results are very promising and seem to confirm our original ideas.
(More…)
Tags: semantics
Posted in Uncategorized | Comments Off on Semantic analysis
by jcc - August 11th, 2009
At OutWit, we are working on adding intelligence to the Web browser.
The free beta applications that you have been downloading from our site are only parts of what we are developing. They are implementations of some of the recognition and extraction capacities that we are including in the OutWit Kernel. We have been talking about a public API for more than a year now and, although it is definitely still in the pipe, we have been delaying it (as for the complete help and documentation) until we can reach a stable enough version of the kernel and feel confortable with people starting to write code around it.
We are convinced that the future will prove it was a good idea to add semantic intelligence to the browser itself instead of exclusively focusing on the server side.
Tags: semantics
Posted in Uncategorized | Comments Off on Our mission
by jcc - July 28th, 2009
OutWit’s collection technology is organized around three simple concepts:
- The programs dissect the Web page into data elements and enable users to see only the type of data they are looking for (images, links, email addresses, RSS news…).
- They offer a universal collection basket, the « Catch », into which users can manually drag and drop or automatically collect structured or unstructured data, links or media, as they surf the Web.
- They also know how to automatically browse through series of pages, allowing users to harvest all sorts of information objects in a single click.
With simple intuitive features as well as sophisticated scraping functions and data structure recognition, the OutWit programs target a broad range of user categories.
(More…)
Tags: extract data, OutWit Docs, Outwit Hub, OutWit Images
Posted in Uncategorized | Comments Off on General overview of the OutWit programs
by jcc - June 23rd, 2009
In our work for the coming versions of the kernel, one of our main chalenges is OutWit’s ability to explore the hidden Web. We are working on some very exciting features in this area (partly autonomous, partly user-driven explorations). The Deep Web is composed of Web pages and resources that are not indexed by search engines, simply because there are no links to them. One of the interesting functions we are working on is the generation of URLs and queries to the dark side of the Web.
(More…)
Tags: deep web, grab images, image extraction, Outwit Hub, OutWit Images, query generation matrices
Posted in New releases | 1 Comment »
by jcc - March 29th, 2009
We released the first public version of OutWit Docs during the weekend as well as updated versions of OutWit Images and Outwit Hub.
OutWit Docs is a simple WebTop Document Finder, based on our Kernel. It allows you to search through Websites and search engines for documents and it will present the results as an operating system would, either in icon view or as a list of files.
oW Docs looks for text files, spreadsheets, presentations in various formats (including PDF, MS Office, OpenOffice documents, RTF, CSV…).
In this version, the filtering & automatic selection options are somewhat basic (name, file type…), but we are going to improve these along the way. As we cannot download all the result files to explore their contents, we are working on a multi-layered filtering process to refine the query, refine the selection and search the content of the most pertinent files only.
As for all our products, your suggestions will be extremely welcome. In the meantime, we hope that you’ll enjoy this program.
Posted in Uncategorized | 6 Comments »
by jcc - December 9th, 2008
IMPORTANT! – BEFORE READING ON:
- The main FAQ is in the Help within OutWit Hub and Email Sourcer applications (in the top menu: Help>Frequently Asked Questions). It is much more up-to-date and targeted for Hub/Sourcer users. Please refer to it rather than this static page.
- Make sure you have the latest version, especially if you just downloaded your program from a third-party site.
The latest version of OutWit Hub is on the download page of outwit.com and the version history is here. For Email Sourcer, the links are respectively download and history.
Compatible Versions
General Questions
Technical Questions
(More…)
Tags: Add new tag, Outwit Hub, OutWit Images
Posted in Uncategorized | 76 Comments »
by jcc - November 25th, 2008
Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.
You should run these to discover the Hub.
The newest update of the Hub contains some exciting new features. This tutorial will explain these new functionalities but for a more detailed explanation of how to use the Hub’s basic features please, refer to the existing list of tutorials.
(More…)
Tags: Execute and Catch, Select Different, Select Identical, Select Inversion, Select Similar
Posted in Tutorials (Web Scraper) | 11 Comments »
by jcc - November 19th, 2008
Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.
You should run these to discover the Hub.
NOTE: This tutorial was created using version 0.8.2. The Scraper Editor interface has changed a long time ago. More features were included and some controls now have a new name. The following can still be a good complement to get acquainted with scrapers. The Sraper Editor can now be found in the ‘Scrapers’ view instead of ‘Source’ but the principle remains funamentally the same.
In this example we’ll redo the scraper from the previous lesson using Regular Expressions. This will allow us to create a more precise scraper, which we can then apply to many URLs. When working with RegExps you can always reference a list of basic expressions and a tutorial by selecting ‘Help’ in the menu bar.
Recap: For complex web pages or specific needs, when the automatic data extraction functions (table, list, guess) don’t provide you with exactly what you are looking for, you can extract data manually by creating your own scraper. Scrapers will be saved on your computer then can be reapplied or shared with other users, as desired.
(More…)
Tags: extract data, Mult URLs, Outwit Hub, RegExp, scraper, Web Harvester
Posted in Tutorials (Web Scraper) | 46 Comments »
by jcc - November 18th, 2008
Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.
You should run these to discover the Hub.
This tutorial was created using version 0.8.2. The Scraper Editor interface has changed a long time ago. Many more features were included and some controls now have a new name. The following can still be a good complement to get acquainted with scrapers. The Sraper Editor can now be found in the ‘Scrapers’ view instead of ‘Source’ but the principle remains funamentally the same.
Now that we’ve learned how to create a scraper for a single URL, let’s try something a little more advanced. In this lesson we’ll learn how to create a scraper which can be applied to a whole list of URLs using a simple method suited for beginners. In the next lesson a more complex scraper utilizing regular expressions will be demonstrated for our tech savvy users. Geeks, feel free to skip to: Creating a Scraper for Multiple URLs using Regular Expressions.
Recap: For complex web pages or specific needs, when the automatic data extraction functions (table, list, guess) don’t provide you with exactly what you are looking for, you can extract data manually by creating your own scraper. Scrapers will be saved on your computer then can be reapplied or shared with other users, as desired.
(More…)
Tags: Mult URLs, scraper, Web Harvester
Posted in Tutorials (Web Scraper) | 22 Comments »
by jcc - November 17th, 2008
![outwit-images_screenshot](http://blog.outwit.com/wp-content/uploads/2008/11/outwit-images_screenshot-300x187.png)
The first beta version of OutWit Images was posted on Firefox Add-ons and came out of the experimental zone this weekend. This new outfit is an online image browser that not only allows you to view Web images as a slideshow or as a wall of thumbnails but also to grab the pictures and save them to your hard disk.
The feedback we are already receiving for this extension is extremely encouraging: with Images like with the Hub, people are actually managing to do things they simply couldn’t do before. And this really makes us happy…
Download OutWit Images
Tags: OutWit Images
Posted in New releases | Comments Off on OutWit Images Was Released