Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.
You should run these to discover the Hub.
Here is an introduction tutorial, which will help you get acquainted with Outwit Hub in minutes.In this first guide, you will mostly learn the function of the main navigation and collection controls to be found in the Hub’s interface.
If you wish to use OutWit Hub as a Firefox add-on, you will first need to install Firefox.
To download it, click here.
After installing Firefox, you can download Outwit Hub by clicking here.
If you chose to use the standalone version, you just need to follow the install instructions and run the application.
If the installation isn’t performed automatically, find the downloaded .xpi file, open it with Firefox (or drag it to the Firefox window) and follow the instructions.
If you are using the standalone application, double-click on its icon, like for any other program.
For the add-on: after Firefox has reloaded, click on the OutWit Button in the tool bar.
If the icon is not visible, in the menu bar select Tools -> OutWit -> OutWit Hub
OutWit Hub will open with the Web page currently loaded in Firefox.
You are now in OutWit Hub. The first view on the left “Page” is the Web Page itself. All the other views are objects: links, images, data…that are found in this page.
The application is a browser that dissects the pages into data elements. On the first view “Page” you are on the Web, and in the others you see a filtered view of the current page you are visiting.
You only need to understand three simple concepts to use OutWit Hub:
- A shopping basket named the “Catch” is at your disposal at the bottom of the window to collect whatever you desire and it will follow you anywhere you surf.
- You can filter the information to see only the type of data you want and “catch” it into your basket.
- You can navigate automatically through large series of pages with just the click of a button and OutWit Hub will recognize and collect data for you.
3. Quick Overview on How to Navigate
Address Bar: In which you will type URLs or search queries.
Receives either the URL to load as the current page or a query that will be forwarded to the preferred search engine.
Here are the main controls at your disposal:
The ‘Page’ view is the Website. The current page is the one that is analyzed in the other views. (Which respectively show its images, links, emails, text, rss news links, and data that can be extracted from the page.)
Displays a table of all the images found in the Web page currently displayed in the ‘Page’ view. Images can be filtered, sorted, and moved to the Catch. (more about this view: see related tutorials)
Displays a table of all the links found in the Web page currently displayed in the ‘Page’ view. Links can be filtered and sorted to extract specific content and documents then moved to the Catch.
Displays a table of all the email addresses found in the Web page currently displayed in the ‘Page’ view. Emails can be filtered, sorted and moved to the Catch or exported to a file.
Displays the current page as simple text.
Displays articles of all RSS feeds found in the page. The news can be filtered, sorted and moved to the Catch or exported.
Gives access to the four extraction modes: tables, lists, guess and scraped.
These Appear as Views of Data:
Analyzes the page and extracts the data contained in the HTML tables. Results are displayed in a table and can be filtered, sorted and moved to the Catch or exported.
Analyzes the page and extracts the data contained in the HTML lists. Results are displayed in a table and can be filtered, sorted and moved to the Catch or exported.
Analyzes the page and tries to infer the data structure from recurring strings, remarkable labels and the recognition of semantic units. Results are displayed in a table and can be filtered, sorted and moved to the Catch or exported.
If a scraper was associated with the URL of the current page, it will be applied when clicking the “Scraped” view. Results are displayed in a table and can be filtered, sorted and moved to the Catch or exported.
Displays the colorized source code of the current page emphasizing the text that is actually displayed on the page. Also contains the scraper editor where a data scraper can be defined for a specific URL.
Hide Local: If you check this box, the view will only display the outgoing links or external images of the current Web page. Elements that are local to the Web page will be hidden.
Hide Cache: When this box is checked in “Links”, cache URLs are not displayed (a cache is a saved state of a Web page kept by a search engine).
Documents: When checked in “Links”, only URLs that correspond to documents: pdf, doc, xls, etc., are displayed.
Script: When this box is checked in the “Images” view, images used or cited in the HTML script tags are displayed, otherwise these will be hidden.
Style: Similar to the “Script” check box, this box, when checked, displays images used or cited in the styles.
Background: Similar to the two check boxes above, this box, when checked, displays the images used as background in the HTML code.
Next in Series: Loads the next page in a series.
Active when your current page is part of a series, for ex: the result page for a query in a search engine that continues for multiple pages. OutWit Hubs finds a navigation link to the following page.
Browse: Auto-Browses through all the pages of a series.
Active when your current page is part of a series, for ex: the result page for a query in a search engine that continues for multiple pages. The Hub will continue through the links until the end. To stop the navigation press escape or click the button a second time.
Dig: Automatically explores all the links of the current page.
Active when OutWit finds links in the current page. Clicking on Dig displays a menu to set the depth of the dig. Depth = 0 will browse through all the links of the current page, Depth = 1 will also explore the all the links of pages visited. If the current page is part of a series, the dig will go on as long as a next page is found. Escape or a second click of the button will stop the dig.
Site Home: Loads the home page of the current site.
Active when the current page is not the home page of a site. It goes to the top of the current site’s hierarchy.
Slideshow: Displays the images of the page as a slide show.
Active when OutWit finds images in the current page. The slideshow can be viewed in full screen or in the “Page” view. If the current page is part of a series, the slide show will go on as long as a next page is found.
Scripts are not yet implemented in the current version. Logs and History implementations are still very basic but there is more to come….
At the bottom of the OutWit Hub window, is a widget called “the Catch.” If you do not see it, select “Catch” in the View list. If it is too small, bring your cursor to the splitter between the widget panel and the Catch; your cursor will become a two way-arrow, allowing you to resize it to your liking. The Catch will remain visible whichever View you select.
So, what’s the Catch?
Not to worry. It’s just a shopping basket you carry around while you surf. When you are using one of the views (pages, images, links, emails, data..), you may want to select some information to work on. The catch allows you to keep a personalized selection of items, sort them by priority, save them to an Excel spreadsheet, save the related images or files, etc.
For this, you only have to drag and drop the item(s) you want into the Catch or, (in any view other than “pages”) select the rows you want in any view and click on Catch.
You can also “catch” the selected rows by hitting the Return key
Save Incoming Files: When checked, if an item moved to the Catch contains a link to a file (image, documents..) this file will be saved to your hard disk.
Empty: When this box is checked, whenever you load a new page, the view’s content is deleted. Otherwise, the data is kept between page loads.
Catch Selection: When loading a new page, selected data will automatically be moved to the catch.
Rating & Priority: Allows you to add a rating to selected rows in the Catch. This can be very useful if your catch contains a large quantity of items.