Archive for November, 2008
Creating a Scraper for Multiple URLs Using Regular Expressions
Wednesday, November 19th, 2008NOTE: This tutorial was created using version 0.8.2. The Scraper Editor interface has changed in version 0.8.9. More features were included and some controls now have a new name. We will update the tutorials as soon as the interface for the Pro version is completely stabilized. We are sorry for the inconvenience. In the meantime, the following should still be a good way to get acquainted with scrapers. The Sraper Editor can now be found in the ‘Scrapers’ view instead of ‘Source’ but the principle remains funamentally the same.
In this example we’ll redo the scraper from the previous lesson using Regular Expressions. This will allow us to create a more precise scraper, which we can then apply to many URLs. When working with RegExps you can always reference a list of basic expressions and a tutorial by selecting ‘Help’ in the menu bar.
Recap: For complex web pages or specific needs, when the automatic data extraction functions (table, list, guess) don’t provide you with exactly what you are looking for, you can extract data manually by creating your own scraper. Scrapers will be saved on your computer then can be reapplied or shared with other users, as desired.
Creating a Scraper for Multiple URLs, Simple Method
Tuesday, November 18th, 2008NOTE: This tutorial was created using version 0.8.2. The Scraper Editor interface has changed in version 0.8.9. More features were included and some controls now have a new name. We will update the tutorials as soon as the interface for the Pro version is completely stabilized. We are sorry for the inconvenience. In the meantime, the following should still be a good way to get acquainted with scrapers. The Sraper Editor can now be found in the ‘Scrapers’ view instead of ‘Source’ but the principle remains funamentally the same.
Now that we’ve learned how to create a scraper for a single URL, let’s try something a little more advanced. In this lesson we’ll learn how to create a scraper which can be applied to a whole list of URLs using a simple method suited for beginners. In the next lesson a more complex scraper utilizing regular expressions will be demonstrated for our tech savvy users. Geeks, feel free to skip to: Creating a Scraper for Multiple URLs using Regular Expressions.
Recap: For complex web pages or specific needs, when the automatic data extraction functions (table, list, guess) don’t provide you with exactly what you are looking for, you can extract data manually by creating your own scraper. Scrapers will be saved on your computer then can be reapplied or shared with other users, as desired.
OutWit Images Was Released
Monday, November 17th, 2008
The first beta version of OutWit Images was posted on Firefox Add-ons and came out of the experimental zone this weekend. This new outfit is an online image browser that not only allows you to view Web images as a slideshow or as a wall of thumbnails but also to grab the pictures and save them to your hard disk.
The feedback we are already receiving for this extension is extremely encouraging: with Images like with the Hub, people are actually managing to do things they simply couldn’t do before. And this really makes us happy…