WebScraper 4.2.0 – Scan and output website data as CSV or JSON.

May 2, 2018

WebScraper uses the Integrity v6 engine to quickly scan a website, and can output the data (currently) as CSV or JSON.

Easy to scan a site – just enter the starting URL and press “Go”
Easy to export – choose the columns you want
Plenty of extraction options, including HTML elements with certain classes or IDs, regular expressions, or entire content in a number of formats (html, plain text, markdown)
Configuration of various limits on the crawl and the output file size

What’s New

Version 4.2.0:

Improves the ' output file column builder' table - columns appear in columns rather than rows as before, so hopefully easier to use. You can drag the columns to re-order them, edit their headings, edit the configuration of that column or delete the column.
Improves the output file filter (Used to be called 'information page contains'). This can now be regarded as a 'select where' and allows for setting up a number of rules, AND'd or OR'd. These can be based on a 'contains' partial match, or regex. More options for each rule such as contains / doesn't contain, and applying the rule to the url or the entire content.
Adds a proper links table, this can be used to collect / list all link target urls discovered on the way, and optionally image urls too. This list can be filtered for just links / just images / internal / external / redirected / pdf documents
Adds capability to easily extract headings (h1-h7) with particular class / id (previously the class or id method was limited to divs, spans, p's and dd's)
Alters the 'count' that is displayed at the right of the address bar. Now it literally displays the number of pages scraped which = rows in the output table. Previously it was a count of pages discovered, which may not be the same number now that you can make rules that act as an output filter.
Fixes recently-introduced bug which prevented your output columns from saving properly in a saved project

Compatibility

OS X 10.8 or later, 64-bit processor

Screenshots

Tags: Scrape data WebScraper

You may also like...

Leave a Reply Cancel reply

You must Register or Login to post a comment.