GDS – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 The best data opens itself on UK Gov’s Performance Platform https://blog.scraperwiki.com/2014/01/the-performance-platform-3-open-data/ Mon, 20 Jan 2014 17:00:40 +0000 https://blog.scraperwiki.com/?p=758220447 This is third in a series of posts about the UK Government’s Performance Platform, cross-posted on the OKFN blog as it is about open data. Part 1 introduced why the platform is exciting, and part 2 described how it worked inside.

The best data opens itself.

No need to make Freedom of Information requests to pry the information out of the state.

No need to build massive directories as checklists for civil servants to track what they’re releasing.

Instead, the data is just there. The code just opens it up naturally as part of what it does.

One of the unspoken exciting things about the UK Government’s Performance Platform is that it is releasing a whole bunch of open data.

Here are two examples.

Pet shop licences

1. Licensing performance

This is a graph (with data underneath, of course!) of pet shop licenses applied for over time in various counties. It’s part of a larger system which will eventually have all different types of licenses all over the country. You can already find alcohol, food, busking… Lots of topics.

As always with open data, there’ll be many unpredictable uses. Most users will do so quietly, you will never know they did. Perhaps a manager at Pets at Home can spot changing pet shop market conditions, or a musician carefully examine the busking license data…

 2. Tax disc for vehicles

Tax disc applications

Basic data about transactional services can potentially tell you a lot about the economy. For example, the graph on the right of vehicle tax disc applications. This could tell an auto dealer – or a hedge fund! – information about car ownership.

It is constantly updated, you’re getting much fresher data than any current national statistics. If you need it, the current number of users online is updated in real time. As the performance platform expands, I’d expect it to offer breakdowns by location and type of vehicle.

A charity can learn about digital inclusion from this open data. How many people are applying online as opposed to at a post office?

The future

Already, with the performance platform only in its alpha phase, numerous datasets are being released as a side effect. This will grow for several reasons:

  • GDS aspire to have hundreds of services covered, across the whole range of Government.
  • Service managers in departments can get extra visualisations they need, extending the diversity of data.
  • At some point politicians will start asking for more things to be measured.
  • Maybe in the end activists will make pull requests to improve the data released.

This is great for businesses, charities, citizens, and the Government itself.

A fundamentally new kind of open data – that which transactional services can spit out automatically.

Making things open makes things better.

What data are you looking forward to the performance platform accidentally releasing for you?

]]>
758220447
Data Science (and ScraperWiki) comes to the Cabinet Office https://blog.scraperwiki.com/2013/12/data-science-and-scraperwiki-comes-to-the-cabinet-office/ Thu, 05 Dec 2013 09:40:25 +0000 https://blog.scraperwiki.com/?p=758220611 The Cabinet Office is one of the most vital institutions in British government, acting as the backbone to all decision making and supporting the Prime Minister and Deputy Prime Minister in their running of the the United Kingdom. On the 19th of November, I was given an opportunity to attend an event run by this important institution where I would be mentoring young people from across London in Data Science and Open Data.

OpenDataHackathon

The event was held at the headquarters of the UK Treasury, which occupies a palatial corner of Westminster overlooking St James’s Park, just a stone throw away from Buckingham Palace. Also in attendance were programmers, data scientists, project managers and statisticians from the likes of BT, Experian, data.gov, the Department for Education and the Foreign and Commonwealth Office, as well as my colleague Aine McGuire from ScraperWiki.

After a spot of chatting and ‘getting to know you’, the mentors and mentees split off into small groups where they’d start working on interesting ways they could use open government data; in particular data from the Department for Education.

Despite only having a day to work on their projects, each of the teams produced something incredible. Here’s what they made:

Edumapp

Students from a sixth-form college in Hammersmith and from the University in Greenwich chose to put together mapping technologies and open data to make it easy for parents to find good schools in their area.

EduMap

They even managed to create a tablet-ready demonstration product built using Unity 3D, which displayed a number of schools in England and Wales, with data about the academic performance of the school being displayed opposite. Despite the crippling time constraints of the day, they managed to create something that worked quite well and ended up winning the award for ‘best use of Open Data’.

Neetx

In British parlance, NEET is someone who is Not in Education, Employment or Training. It’s a huge problem in the UK, wasting huge amounts of human potential and money.

NeetX

But what if you could use Open Data to make it inspire young people to challenge themselves and take advantage opportunities related to their interests? And what if that came packaged in a nice, accessible phone app? That’s what one of the teams in attendance did, resulting in Neetx.

Cherry Picker

The explosion in speciality colleges (confusingly, these are almost all high-schools) under the Labor government has made it easy for pupils with very specific interests to choose a school that works for them.

But what if you wanted a bit more detail? What if you wanted to send your child to a school that was really, really good at sciences? What if you wanted to cherry pick (see what I did there?) schools based upon their performance based upon their performance in certain key areas? Cherry Picker makes it easy to do just that.

University Aggregator

Finding the right university can be hard. There’s so much to be taken into consideration, and there’s so much information out there. What if someone gathered it all, and merged it into a single source where parents and prospective students could make an informed decision?

 

That’s what one of the teams attending proposed. They suggested that in addition to information from the National Student Survey and government data, they could also use information from Which?, The Telegraph and The Guardian’s university league tables. This idea also got a great reception from the mentors and judges in attendance, and is one idea I would love to see become a reality.

Conclusion

I left the Cabinet Office impressed with the quality of the mentorship offered, the quality of the ideas given as well as the calibre of the students attending. The Cabinet Office really ought to be commended for putting on such an amazing event.

Were you in attendance? Let me know what you thought about it in the comments box below.

]]>
758220611
Live-graphing the UK Government’s agile auction https://blog.scraperwiki.com/2013/11/live-graphing-the-uk-governments-agile-auction/ https://blog.scraperwiki.com/2013/11/live-graphing-the-uk-governments-agile-auction/#comments Thu, 28 Nov 2013 11:22:50 +0000 https://blog.scraperwiki.com/?p=758220192

Hold your nerve! Hold your nerve! Stay at 49! I really think it’s over this time.

It had been intense all day, an adrenalin rush. Tens of thousands of pounds potentially at stake. Watching carefully in shifts with no more than a minute of distraction. Luckily Aidan held his nerve, the auction did close this time, with a happy result for us.

Livestock Auction Held Every Friday at the Rifle Sales-Yard, 10/1972

No, not a new board game the ScraperWiki offices have got addicted to, but an online auction for the Government to – at last! – get the best price for IT services that it buys. And agiley developed services to boot, delivered by small businesses as well as large.

This new purchasing systems is called the Digital Services Framework. It is well written up by GDS. In each category, the 50 best priced companies make it onto the framework.

As a businessman, I obviously don’t like this having to bid down to the market price – I want to maximise our revenue, we’re a startup, we’ve a product to cross-fund development of.

But as a citizen, oh as a citizen, I love it!

Bids, which are for a day rate for work in various skill categories, can only go down, in multiples of £5. We all started with varying initial bids we’d filled in on some forms months before.

On the day, the auction gives out spartan information – simply your own ranking. Being a company full of data scientists, we naturally logged and analysed all the data. To turn the limited information into as much value as we could find.

This is a graph (make things open: it makes things better!) of our ranking for one of the auctions, from when it started at 10am until closing just in time for afternoon tea at 4pm (click it for a bigger version).

Auction ranking graph

The narrative is roughly like this:

  • We opened in 11th place. Only the top 50 at the end get through, out of some 120 total. Had we opened too low?
  • I’ve convinced myself the first upward curve that then nearly levels out is a sigmoid function. I don’t know why 20 people in those first 2 hours wanted to push their rank above us. I’m guessing due to later marketing advantage of being able to say you were the cheapest ranked.
  • The big drop at 1pm is the only time we changed our bid. Cautious about being in the 40s, we dropped bid in value £20, and our ranking changed 25 places. Yes, to our astonishment, half the pack were bunched together in just four bid slots.
  • It took two hours for us to go from rank 20 back to 20 again – with our drop in bid of £20 in the middle. That means the auction as a whole was moving downwards at just £10 an hour. We were worried for a bit it would go on for days…
  • We passed through the same sigmoid again. Then there’s a massive discontinuous spike in the 40s. Lots of people very tightly bunched together, avoiding the slightly risky 49/50 positions.
  • And then we stopped at 49, we know not why. Held our breath (as in the opening quote). There was then a message about a change in the auction rules that would mean it definitely ended at the end of the day (more on that below). And after that there were no more bids.

We ended up in a good place. Mind you, I feel like it is also a fair place for everyone. In another auction we ended up 23rd, and from the above I’m pretty sure the difference from being 49th was only about £20 difference, perhaps less due to the higher densities in the high 40s.

There’s a complexity to the auction I haven’t mentioned yet.

Every time somebody bids, the countdown clock resets to 5 minutes. It’s as if you’re in Run Lola Run, or more like it Groundhog Day. If nobody bids for 5 minutes (300 seconds), the auction ends. If someone bids… The clock resets.

This graph is of how long people waited before bidding.

The data isn’t as accurate as our ranking changes above, as we were sometimes too time pressured to type it in. We hadn’t made a scraper, as we didn’t want to risk doing any harm to the auction process or software.

Time waited in auction

This graph is hard to interpret, but clearly does have information content. One important thing to know is that there were several auctions going on at once, some of which we weren’t even signed up to. The clock would reset if anyone bid in any of them.

This is the story:

  • The high values are where everyone held their nerve, and for a brief period nobody bid. To start with that was rarely for more than a minute, later it got longer.
  • The reason it is so spiky is this. There was a regular pattern throughout where the clock would tick down a long way (a minute early on, 4 minutes later on), then somebody would bid. As a result, we assumed, of the ranking changes that caused, there was then a flurry of quite fast bidding.
  • There’s a patch starting about 13:15 where the wait broke 200 seconds for the first time, and everything slowed down for a while. I wonder if that was the same as between the two bumps in our ranking graph, or if it is some random effect of the multiple auctions combining.
  • In the last part of the day, a minority of bidders were waiting until there was less than 30 seconds left and then bidding. In theory this gives you a tiny bit more information. In practice it is tedious and risky – what happens if your Internet connection breaks?

I was hoping to spot trends and patterns to be able to estimate, for example, when the auction would end.

This was pre-empted by the late afternoon rules change – they said they were going to get rid of the 5 minute resetting element at 5pm, and declared that the auction would definitely end at 6pm. This is quite a different beast, arguably something where everyone does a sealed bid at the end.

Whatever way, after 2pm very few people were bidding at all. In theory one bid could cause the whole pack to shuffle round again. In practice, the rates of bidding seemed to decay and it never started up again.

So there, it is, inside red-in-tooth-and-claw-capitalism competing to help deliver, for a Government that (touch wood) is finally getting it, to deliver user-centric public services.

I raise a standard deviation to that! Cheers!

If you work for a UK government agency, looking for help with data migration, analysis or visualisation, our place in the top 50 means you can get in touch now to start work.

]]>
https://blog.scraperwiki.com/2013/11/live-graphing-the-uk-governments-agile-auction/feed/ 2 758220192