Anna Powell-Smith – ScraperWiki Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 58264007 What could a journalist do with ScraperWiki? A quick guide Fri, 16 Jul 2010 11:17:25 +0000 For non-programmers, a first look at ScraperWiki’s code could be a bit scary, but we want journalists and researchers to make use of the site, so we’ve set up a variety of initiatives to do that.

Firstly, we’re setting up a number of Hacks and Hacker Days around the UK, with Liverpool as our first stop outside of London. You can follow this blog or visit our eventbrite page to find out more details.

Secondly, our programmers are teaching ScraperWiki workshops and classes around the UK.

Anna Powell-Smith took ScraperWiki to the Midlands, and taught Paul Bradshaw’s MA students at Birmingham City University the basics. Paul has written up some notes at this link.

Julian Todd ran a ‘Scraping 101’ session at the Centre for Investigative Journalism summer school last weekend. He ran through the basics of ScraperWiki and showed how he was using it to map and track offshore oil wells in the UK.

You can see his slides here at this link.

Julian explained just why ScraperWiki is useful…

Your options for webscraping

1. Do the coding yourself

2. Get someone else to code it for you

3. Have it done already!

Number 3 is where ScraperWiki, a place for sharing scrapers, comes in.

Last month, ScraperWiki spoke and also manned a stall at’s news:rewired event. You can read a write-up of Francis Irving’s presentation here by’s Rachel McAthy:

The presentations were concluded by Francis Irving, developer for ScraperWiki, who outlined how they can help journalists transform confusing data into a newsworthy story. He showed two examples of datasets the company can ‘scrape’ data from, producing more accessible tables or even visualisations such as maps, saving journalists’ time.

(Some more general points from the session can be read here)

Meanwhile, Jon Jacob from the BBC College of Journalism caught Francis on video…

If you have any questions about ScraperWiki or our Hacks and Hackers events please contact Aine McGuire: aine [at] scraperwiki [dot] com.

]]> 4 758213701
Government data release: what’s still out there Mon, 07 Jun 2010 11:54:00 +0000 James Ball

Last week saw big steps forward in public data: on Monday, Prime Minister David Cameron wrote to all government departments, setting out a timetable for the release of a swathe of official datasets.

On Wednesday, the first two (senior civil service pay and MRSA infection rates) appeared – but the real meat came on Friday with the release of millions of rows of data from the official treasury database, COINS – which has already been packaged into a usable format by the Open Knowledge foundation

A big step forward – but a new dataset over at ScraperWiki reveals there’s still a very long way to go. Developer Anna Powell-Smith has built a scraper for the Information Asset Register (IAR).

The IAR is a register of unpublished datasets held by government departments – and it has more than 2,100 entries. The database shows which department holds the information, and should include a short description of what’s in there.

The data shows how far is still to go for open information: for one, David Cameron’s release last week covers fewer than ten datasets – important ones, beyond a doubt, but only a scratch in the surface.

But this is just a small part of the problem, as anyone looking at the full data in Powell-Smith’s scrape can see: even in this register of government data, quality is low.

More than half of the records in the IAR are missing details – often details as basic as a description of the record’s contents. Some departments have submitted hundreds of datasets, while others appear to have merely carried out a cursory search and listed a handful. Some didn’t even bother to do that.

A first step for the government’s new Transparency Board should doubtless be to update the register and bring it up to scratch.

Cameron warned that the data would initially be patchy. Given the poor state of even this simple document, it seems he wasn’t kidding. The culture of government might be changing, but developers and journalists alike will need to keep on the pressure, if data good enough to be of use to anyone is going to come out.

Get the data here.

Done something with this data? Let us know – @scraperwiki on Twitter or