Grab HTML Tables to Excel Spreadsheets

Important Note: The tutorials you will find on this blog may become outdated with new versions of the program. We have now added a series of built-in tutorials in the application which are accessible from the Help menu.
You should run these to discover the Hub.

While surfing the Web, you may have come across interesting data that you want to use offline. You then faced the tiresome task of copying and pasting all the information row by row, column by column. OutWit Hub‘s “Data” views can automatically do this for you.

In this tutorial, we are going to learn how to grab structured data from a Web page with the “Table” view and export it to an Excel spreadsheet.


1. Launch OutWit Hub

If you haven’t installed OutWit Hub yet, please refer to the Getting Started with OutWit Hub tutorial.

Begin by launching OutWit Hub from Firefox. Open Firefox then click on the OutWit Button in the toolbar.

If the icon is not visible go to the menu bar and select Tools -> OutWit -> OutWit Hub

OutWit Hub will open displaying the Web page currently loaded on Firefox.

2. Go to the Desired Web Page

In the address bar, type the URL of the Website.  You can also type any string to search and OutWit Hub will look for it using the preferred search engine selected in Firefox.

Today, lets use this website which contains detailed information of 2008 Olympic medals: http://simon.forsyth.net/olympics.html

Go to the “Page” view where you can see that OutWit Hub displays the Web page as it would appear in a traditional browser.

Now, select “Data” from the view list and then select “Table,” the first view of the “Data” section.


In the “Data” section, OutWit Hub displays and structures all the data that it recognizes from the current Web page in the following views: tables, lists, guess and scraper.

If the “Table” view is blank, reload the page.

The “Table” view analyzes the source code of the page and extracts the data contained in the HTML tables.

If you do not get the desired results in this view, try clicking “Guess.” OutWit Hub will attempt to recognize the data present in the page even if not properly structured in tables. Another option is to create a scraper in the “Source” view. Click here for a tutorial on creating a scraper.

3. Export the Table into an Excel Spreadsheet

The data in the “Table” view can be edited, filtered, sorted and moved to the “Catch” or exported directly into an Excel file.

Let’s export the current table, so we can work directly on the Excel spreadsheet.

Select the rows you want.

If you want to select several rows, hold down the ctrl key (cmd key for Mac users) and select the desired data. To select all, you can use the shortcuts ctrl-A or cmd-A for Macs.

In the menu bar select “File” then “Export Selection as” or use the shortcuts ctrl-E/cmd-E. Select the destination folder and hit “OK.”

Based on your operating system and your version of Office, when opening the spreadsheet in Excel you may see a window saying the document you are trying to open is different from the one specified by the file extension. This is normal. Hit “OK” to open it.

4. Use the Catch Panel

If you want to save several tables from different Web pages in the same Excel file, you can use the “Catch” to collect the data and then export into directly into Excel.  To export the data from the “Catch,” select all then right click and select “Export Selection As…”

Table results can be dragged and dropped into the “Catch” or can be caught automatically by selecting “Catch Selection.”  Please note that if the “Empty” box is checked the existing information in the “Catch” will be replaced when you load a new Web page.  The tutorial Getting Started With OutWit Hub gives on overview on how to use the catch.

5. Application Examples

HTML tables are common in Web pages and simple to extract with OutWit Hub’s “Table” view. You can get more acquainted with this feature using the following link:

List of European Union member states in Wikipedia

by

Tags: , , , , , , , , ,

18 Responses to “Grab HTML Tables to Excel Spreadsheets”

  1. Carlos Says:

    Hi, I’m using Windows XP and Firefox 3.0.3 and I just installed your addon but i could not get it working. Do you have any advice on how to get it working? Many thanks. Regards,

    Carlos

  2. kl Says:

    Do you see the OutWit button on the tool bar? If so, you can click it to launch the OutWit Hub and then use the tutorials to guide you through the navigation. You can also select “Tools” on the menu bar and launch OutWit Hub from there. If you are still having problems, please, don’t hesitate to ask.

  3. Rob Says:

    Apologies if you have covered this topic elsewhere.

    1. Is it possible to “schedule” a catch?

    eg. Lets say I want to capture share prices at 1 hour intervals throughout the day by catching tables from Yahoo finance.

    2. Is it possible to “batch” or “upload” a series of catch instructions -rather than 1 at a time?

    eg. Lets say I had a variable list of share prices I wanted to capture every day.

    3. Is it possible to autosave the excel output of these files everytime an update occurs?

    Many thanks…..and many thanks for what looks like a great product….

    Rob

  4. prashanth Says:

    hi there,
    when i launch Outwit Hub, it opens up the add-on page but its BLANK!! it does not recognise any web address I type into its address bar and the “Page” is always empty and no web view either. Am on Firefox v3.0.4

  5. BiAiB Says:

    Why can I only export data to excel format? I don’t have MS Office and I can’t open the files generated, even with MS Excel Viewer.

    CSV and ODT should be supported.

  6. kl Says:

    An update for the Hub was released yesterday and in this version CSV is supported.

  7. Tadhg Says:

    Have tried OutWit hub with a page containing frames and it does not seem to be able to find tables which are in a frame.

    Is this a know bug or is there a solution I’m not aware of?

  8. Steve Titterud Says:

    I downloaded OutWit specifically to harvest Medicare claims data.

    However, the view remains empty when the site’s page with a table is in focus. I have tried selecting the table and every other means of “catching” the data, but OutWit seems never to be aware of the table.

    I note that you say OutWit will harvest from HTML tables. Does that mean literally and only what it says, i.e., that a dynamically generated page, say, via ASP, will not be visible?

    Please let me know if I am missing something obvious here.

    Thanks!

  9. Randhir Says:

    How do I use the tables scraper for multiple pages? I went through the tutorial about building your own scraper for multiple pages and am hoping I can do something similar using the tables feature.

    Thanks for a great product.

  10. pr Says:

    @Steve: A HTML page, even dynamically generated with ASP or PHP, is a HTML page. So, if it contains a HTML table, OutWit will find it. If none of the view do extract the data you want, you can create a scraper.

    @Randhir: Go to the table view, check “Catch selection” (or uncheck “empty”, as you need). Then, just browse through the urls you want. Like with scraper, you can select some URLs, right-click, and “Browse through selected URLs”.

  11. daniel Says:

    may i export catched information horizontally to excel? your program does it very good, but i need each html data catched to be a row in my excel. let me know thanks

  12. jcc Says:

    Can you please use the bug report form and explain the problem a little more? I am not sure I exactly get your question. If you are asking if it is possible to transpose columns to rows in the Excel export, then the answer is no, but you can easily do this is Excel (Copy, Paste Special, check Transpose).

  13. Don Juegos Says:

    @jcc , same here, I want the same function as Daniel, to be able to export only a column, vertically, without using another programs.

  14. jcc Says:

    Don Juegos, if this is the function you are looking for, it kind of exists: you cannot export one column yet, but you can copy all cells of a column by selecting all rows and using right click menu item ‘Copy Cell(s)’. You can also move the rows you need to the catch and remove the columns you do not wish to keep before exporting the catch. We also will give more flexibility to exports in a coming version.

    (NOTE: Please, do use the feedback link rather than posting a comment to the blog for support tickets to be followed.)

  15. soda media Says:

    I down loaded this plugin last week. Thank you it has saved me many hours of copy and pasting into excel.

  16. sleepy Says:

    This works perfectly, doing just what I need it to do! It did take several restarts of Firefox to get it going though, but no biggie.

  17. Html table to excel Says:

    [...] 27 Times in 19 Posts You can find the tutorial from Extract Data from a website to Excel – Grab HTML tables to Excel spreadsheets | OutWit Technologies … Also, I guess you can select the table in a webpage and drag-drop the table on excel works.. [...]

  18. Extract Data from a website to Excel – Grab HTML tables to Excel spreadsheets | OutWit Technologies Blog « Civet : The Spice of Life Says:

    [...] Extract Data from a website to Excel – Grab HTML tables to Excel spreadsheets | OutWit Technol…. [...]