AlphaGov – ScraperWiki Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 58264007 Why the Government scraped itself Mon, 13 Jun 2011 13:36:00 +0000 We wrote last month about Alphagov, the Cabinet Office’s prototype, more usable, central Government website. It made extensive use of ScraperWiki.

The question everyone asks – why was the Government scraping its own sites? Let’s take a look.

In total 56 scrapers were used. You can find them tagged “alphagov” on the ScraperWiki website. There are a few more not yet in use, making 66 in total. They were written by 14 different developers from both inside and outside the Alphagov team – more on that process another day.

The bulk of scrapers were there to migrate and combine content, like transcripts of ministerial speeches and details of government consultations. These were then imported into sections of – speeches are here, and the consultations here.

This is the first time, that I know of, that the Government has organised a cross-government view of speeches and consultations. (Although third parties like TellThemWhatYouThink have covered similar ground before). This is vital to citizens who don’t fall into particular departmental categories, but want to track things based on topics that matter to them.

The rest of the scrapers were there to turn content into datasets. You need a dataset to make something more usable.

Two examples:

1. The list of DVLA driving test centres has been turned into the beginnings of a simple app to book a driving test. Compare to the original DfT site here.

2. The UK Bank Holiday data that ScraperWiki user Aubergene scraped last year was improved and used for the Bank Holiday page.

It seems strange at first for a Government to scrape its own websites. It isn’t though. It lets them move quickly (agile!), and concentrate first on the important part – making the experience for citizens as good as possible.

And now, thanks to Alphagov using ScraperWiki, you can download and use all the data yourself – or repurpose the scraping scripts for something else.

Let us know if you do something with it!

]]> 9 758214953
Access government in a way that makes sense to you? Surely not! Wed, 11 May 2011 15:07:22 +0000 uses Scraperwiki, a cutting edge data-gathering tool, to deliver the results that citizens want. And radically for government, rather than tossing a finished product out onto the web with a team of defenders, this is an experiment in customer engagement.

If you’re looking to renew your passport, find out about student loans or how to complete tax returns, it’s usually easier to use Google than navigate through government sites.  That was the insight for director of the Alphagov project Tom Loosemore, and his team of developers.  This is a government project run by government. is not a traditional website, it’s a developer led but citizen focused experiment to engage with government information.

It abandons the approach of forcing you to think the way they thought, instead it provides a simple “ask me a question” interface and learns from customer journeys, starting with the first 80 of the most popular searches that led to a government website.

But how would they get information from all those Government website sites into the new system?

I spoke to Tom about the challenges behind the informational architecture of the site and he noted that: “Without the dynamic approach that Scraperwiki offers we would have had to rely on writing lots of redundant code to scrape the websites and munch together the different datasets. Normally that would have taken our developers a significant amount of time, would have been a lot of hassle and would have been hard to maintain. Hence we were delighted to use ScraperWiki, it was the perfect tool for what we needed.  It avoided a huge headache .”

Our very own ScraperWiki CEO Francis Irving says “It’s fantastic to see Government changing its use of the web to make it less hassle for citizens. Just as developers need data in an organised form to make new applications from it, so does Government itself. ScraperWiki is an efficient way to maintain lots of converters for data from diverse places, such as have here from many Government departments. This kind of data integration is a pattern we’re seeing, meeting people’s expectations for web applications oriented around the user getting what they want done fast. I look forward to seeing the project rolling out fully – if only so I can renew my passport without having to read lots of text first!”

Just check out the AlphaGov tag. Because government sites weren’t built to speak to one another there’s no way their data would be compatible to cut and paste into a new site. So this is another cog in the ScraperWiki machine: merging content from systems that cannot talk to each other. is an experimental prototype (an ‘alpha’) of a new, single website for UK Government, developed in line with the recommendations of Martha Lane Fox’s Review. The site is a demonstration, and whilst it’s public it’s not permanent and is not replacing any other website.  It’s been built in three months by a small team in the Government Digital Service, part of the Cabinet Office.

]]> 5 758214754