Government data release: what’s still out there

James Ball

Last week saw big steps forward in public data: on Monday, Prime Minister David Cameron wrote to all government departments, setting out a timetable for the release of a swathe of official datasets.

On Wednesday, the first two (senior civil service pay and MRSA infection rates) appeared – but the real meat came on Friday with the release of millions of rows of data from the official treasury database, COINS – which has already been packaged into a usable format by the Open Knowledge foundation

A big step forward – but a new dataset over at ScraperWiki reveals there’s still a very long way to go. Developer Anna Powell-Smith has built a scraper for the Information Asset Register (IAR).

The IAR is a register of unpublished datasets held by government departments – and it has more than 2,100 entries. The database shows which department holds the information, and should include a short description of what’s in there.

The data shows how far is still to go for open information: for one, David Cameron’s release last week covers fewer than ten datasets – important ones, beyond a doubt, but only a scratch in the surface.

But this is just a small part of the problem, as anyone looking at the full data in Powell-Smith’s scrape can see: even in this register of government data, quality is low.

More than half of the records in the IAR are missing details – often details as basic as a description of the record’s contents. Some departments have submitted hundreds of datasets, while others appear to have merely carried out a cursory search and listed a handful. Some didn’t even bother to do that.

A first step for the government’s new Transparency Board should doubtless be to update the register and bring it up to scratch.

Cameron warned that the data would initially be patchy. Given the poor state of even this simple document, it seems he wasn’t kidding. The culture of government might be changing, but developers and journalists alike will need to keep on the pressure, if data good enough to be of use to anyone is going to come out.

Get the data here.

Done something with this data? Let us know – @scraperwiki on Twitter or james@scraperwiki.com.

Tags: Anna Powell-Smith, data, David Cameron, IAR, open data, scraperwiki, Transparency Board

ScraperWiki

Extract tables from PDFs and scrape the web

Blog

Government data release: what’s still out there