Posts Tagged ‘harvest web’

Create your First Web Scraper to Extract Data from a Web Page

Friday, August 22nd, 2008

NOTE: This tutorial was created using version 0.8.2. The Scraper Editor interface has changed in version 0.8.9. More features were included and some controls now have a new name. We will update the tutorials as soon as the interface for the Pro version is completely stabilized. We are sorry for the inconvenience. In the meantime, the following should still be a good way to get acquainted with scrapers. The Sraper Editor can now be found in the ‘Scrapers’ view instead of ‘Source’ but the principle remains funamentally the same.

In many cases the automatic data extraction functions: tables, lists, guess, will be enough and you will manage to extract and export the data in just a few clicks.

If, however, the page is too complex, or if your needs are more specific there is a way to extract data manually: Create your own scraper.

Scrapers will be saved to your personal database and you will be able to re-apply them on the same URL or on other URLs starting, for instance, with the same domain name.

A scraper can even be applied to whole lists of URLs.

You can also export your scrapers and share them with other users.

Let’s get acquainted with this feature by creating a simple one.

(more…)