Scraping guides: Values, separated by commas

Francis Irving — Thu, 25 Aug 2011 12:50:30 +0000

When we revamped our documentation a while ago, we promised guides to specific scraper libraries, such as lxml, Nokogiri and so on.

We’re now staring to roll those out. The first one is simple, but a good one. Go to the documentation page and you’ll find a new section called “scraping guides”.

The CSV scraping guide is available in Ruby, Python and PHP. Just as with all documentation, you can choose which at the top right of the page.

“CSV” stands for “comma separated value”. It’s a basic but quite common format for publishing spreadsheet files. Take a look at the scrapers tagged CSV on ScraperWiki. Lots of ministerial meetings and government spending records.

The CSV scraping guide shows you how to download and parse a CSV file, and how to easily save it into the datastore.

Why write a scraper just to load a CSV file in? First there are quirks you’ll inevitably find – inconsistencies in fields, extra rows and columns, dates that you want formatting and so on. Secondly, you might want to load in multiple files and merge them together. Finally, you get the data in ScraperWiki, giving you a JSON API and letting you make views.

You can do quite funky things with even just CSV scraping. For example, see Nicola’s @scraper_no10 Twitter bot.

Next time, Excel files.

cvs – ScraperWiki

Scraping guides: Values, separated by commas