WebScraper 4.5.0 – Scan and output website data as CSV or JSON.

October 2, 2018

WebScraper uses the Integrity v6 engine to quickly scan a website, and can output the data (currently) as CSV or JSON.

Easy to scan a site – just enter the starting URL and press “Go”
Easy to export – choose the columns you want
Plenty of extraction options, including HTML elements with certain classes or IDs, regular expressions, or entire content in a number of formats (html, plain text, markdown)
Configuration of various limits on the crawl and the output file size

What’s New

Version 4.5.0:

Adds option to simply add a column for 'h1' through to 'h4' (under 'Content' in the 'Add a column' dialog. (Useful for extracting info within a heading that doesn't have a class or id).
Enhances the 'list of urls' functionality. If your starting point is a local list of urls in plain text, and if they are 'deep links' rather than domains, then the 'down but not up' rule will apply unless the 'crawl above starting directory' checkbox is ticked.
Small interface glitch corrected. If scan was run to completion, small changes made to the configuration, Go pressed again, the scan would proceed, clearing previous data but the counter within the url / progress field (to the right) would not be reset.
Inherits any recent changes within the Integrity crawling engine

Compatibility

OS X 10.8 or later, 64-bit processor

Screenshots

Tags: Dark Mode WebScraper

You may also like...

Leave a Reply Cancel reply

You must Register or Login to post a comment.