WebScraper 4.2.1 – Scan and output website data as CSV or JSON.

WebScraper 4.2.1

WebScraper uses the Integrity v6 engine to quickly scan a website, and can output the data (currently) as CSV or JSON.

  • Easy to scan a site – just enter the starting URL and press “Go”
  • Easy to export – choose the columns you want
  • Plenty of extraction options, including HTML elements with certain classes or IDs, regular expressions, or entire content in a number of formats (html, plain text, markdown)
  • Configuration of various limits on the crawl and the output file size


What’s New

Version 4.2.1:

  • Improves the ' output file column builder' table - columns appear in columns rather than rows as before, so hopefully easier to use. You can drag the columns to re-order them, edit their headings, edit the configuration of that column or delete the column.
  • Improves the output file filter (Used to be called 'information page contains'). This can now be regarded as a 'select where' and allows for setting up a number of rules, AND'd or OR'd. These can be based on a 'contains' partial match, or regex. More options for each rule such as contains / doesn't contain, and applying the rule to the url or the entire content.
  • Adds a proper links table, this can be used to collect / list all link target urls discovered on the way, and optionally image urls too. This list can be filtered for just links / just images / internal / external / redirected / pdf documents
  • Adds capability to easily extract headings (h1-h7) with particular class / id (previously the class or id method was limited to divs, spans, p's and dd's)
  • Alters the 'count' that is displayed at the right of the address bar. Now it literally displays the number of pages scraped which = rows in the output table. Previously it was a count of pages discovered, which may not be the same number now that you can make rules that act as an output filter.
  • Fixes recently-introduced bug which prevented your output columns from saving properly in a saved project


Compatibility

OS X 10.8 or later, 64-bit processor


Screenshots




Download Now

You may also like...

Leave a Reply