WebScraper 4.1.0 – Scan and output website data as CSV or JSON.

April 21, 2018

WebScraper uses the Integrity v6 engine to quickly scan a website, and can output the data (currently) as CSV or JSON.

Easy to scan a site – just enter the starting URL and press “Go”
Easy to export – choose the columns you want
Plenty of extraction options, including HTML elements with certain classes or IDs, regular expressions, or entire content in a number of formats (html, plain text, markdown)
Configuration of various limits on the crawl and the output file size

What’s New

Version 4.1.0:

Adds capability of downloading images to a folder during the scan. See Complex setup > Output file columns > Also download images to folder.
- Images can optionally be downloaded only if they match a pattern, either partial url or regex match. (leave box blank to download all images discovered)
Adds option to filter output file - ie only include data in output file from certain pages (eg information pages or product pages). This is done by matching the url of the page being scraped, either by partial url (eg /product/) or a regex match
Fixes issue with saving project. (note that saving project does not save data, only settings and configuration. Save data separately using Export from the Results screen or File > Export)

Version 4.0.0 (released as beta):

Incorporates the version 8 crawling engine which has many improvements
Adds 'limit requests to X per minute' control
Updates pre-defined user-agent strings

Compatibility

OS X 10.8 or later, 64-bit processor

Screenshots

Tags: Scrape data WebScraper

You may also like...

Leave a Reply Cancel reply

You must Register or Login to post a comment.