GOV.UK – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 Data Science (and ScraperWiki) comes to the Cabinet Office https://blog.scraperwiki.com/2013/12/data-science-and-scraperwiki-comes-to-the-cabinet-office/ Thu, 05 Dec 2013 09:40:25 +0000 https://blog.scraperwiki.com/?p=758220611 The Cabinet Office is one of the most vital institutions in British government, acting as the backbone to all decision making and supporting the Prime Minister and Deputy Prime Minister in their running of the the United Kingdom. On the 19th of November, I was given an opportunity to attend an event run by this important institution where I would be mentoring young people from across London in Data Science and Open Data.

OpenDataHackathon

The event was held at the headquarters of the UK Treasury, which occupies a palatial corner of Westminster overlooking St James’s Park, just a stone throw away from Buckingham Palace. Also in attendance were programmers, data scientists, project managers and statisticians from the likes of BT, Experian, data.gov, the Department for Education and the Foreign and Commonwealth Office, as well as my colleague Aine McGuire from ScraperWiki.

After a spot of chatting and ‘getting to know you’, the mentors and mentees split off into small groups where they’d start working on interesting ways they could use open government data; in particular data from the Department for Education.

Despite only having a day to work on their projects, each of the teams produced something incredible. Here’s what they made:

Edumapp

Students from a sixth-form college in Hammersmith and from the University in Greenwich chose to put together mapping technologies and open data to make it easy for parents to find good schools in their area.

EduMap

They even managed to create a tablet-ready demonstration product built using Unity 3D, which displayed a number of schools in England and Wales, with data about the academic performance of the school being displayed opposite. Despite the crippling time constraints of the day, they managed to create something that worked quite well and ended up winning the award for ‘best use of Open Data’.

Neetx

In British parlance, NEET is someone who is Not in Education, Employment or Training. It’s a huge problem in the UK, wasting huge amounts of human potential and money.

NeetX

But what if you could use Open Data to make it inspire young people to challenge themselves and take advantage opportunities related to their interests? And what if that came packaged in a nice, accessible phone app? That’s what one of the teams in attendance did, resulting in Neetx.

Cherry Picker

The explosion in speciality colleges (confusingly, these are almost all high-schools) under the Labor government has made it easy for pupils with very specific interests to choose a school that works for them.

But what if you wanted a bit more detail? What if you wanted to send your child to a school that was really, really good at sciences? What if you wanted to cherry pick (see what I did there?) schools based upon their performance based upon their performance in certain key areas? Cherry Picker makes it easy to do just that.

University Aggregator

Finding the right university can be hard. There’s so much to be taken into consideration, and there’s so much information out there. What if someone gathered it all, and merged it into a single source where parents and prospective students could make an informed decision?

 

That’s what one of the teams attending proposed. They suggested that in addition to information from the National Student Survey and government data, they could also use information from Which?, The Telegraph and The Guardian’s university league tables. This idea also got a great reception from the mentors and judges in attendance, and is one idea I would love to see become a reality.

Conclusion

I left the Cabinet Office impressed with the quality of the mentorship offered, the quality of the ideas given as well as the calibre of the students attending. The Cabinet Office really ought to be commended for putting on such an amazing event.

Were you in attendance? Let me know what you thought about it in the comments box below.

]]>
758220611
Live-graphing the UK Government’s agile auction https://blog.scraperwiki.com/2013/11/live-graphing-the-uk-governments-agile-auction/ https://blog.scraperwiki.com/2013/11/live-graphing-the-uk-governments-agile-auction/#comments Thu, 28 Nov 2013 11:22:50 +0000 https://blog.scraperwiki.com/?p=758220192

Hold your nerve! Hold your nerve! Stay at 49! I really think it’s over this time.

It had been intense all day, an adrenalin rush. Tens of thousands of pounds potentially at stake. Watching carefully in shifts with no more than a minute of distraction. Luckily Aidan held his nerve, the auction did close this time, with a happy result for us.

Livestock Auction Held Every Friday at the Rifle Sales-Yard, 10/1972

No, not a new board game the ScraperWiki offices have got addicted to, but an online auction for the Government to – at last! – get the best price for IT services that it buys. And agiley developed services to boot, delivered by small businesses as well as large.

This new purchasing systems is called the Digital Services Framework. It is well written up by GDS. In each category, the 50 best priced companies make it onto the framework.

As a businessman, I obviously don’t like this having to bid down to the market price – I want to maximise our revenue, we’re a startup, we’ve a product to cross-fund development of.

But as a citizen, oh as a citizen, I love it!

Bids, which are for a day rate for work in various skill categories, can only go down, in multiples of £5. We all started with varying initial bids we’d filled in on some forms months before.

On the day, the auction gives out spartan information – simply your own ranking. Being a company full of data scientists, we naturally logged and analysed all the data. To turn the limited information into as much value as we could find.

This is a graph (make things open: it makes things better!) of our ranking for one of the auctions, from when it started at 10am until closing just in time for afternoon tea at 4pm (click it for a bigger version).

Auction ranking graph

The narrative is roughly like this:

  • We opened in 11th place. Only the top 50 at the end get through, out of some 120 total. Had we opened too low?
  • I’ve convinced myself the first upward curve that then nearly levels out is a sigmoid function. I don’t know why 20 people in those first 2 hours wanted to push their rank above us. I’m guessing due to later marketing advantage of being able to say you were the cheapest ranked.
  • The big drop at 1pm is the only time we changed our bid. Cautious about being in the 40s, we dropped bid in value £20, and our ranking changed 25 places. Yes, to our astonishment, half the pack were bunched together in just four bid slots.
  • It took two hours for us to go from rank 20 back to 20 again – with our drop in bid of £20 in the middle. That means the auction as a whole was moving downwards at just £10 an hour. We were worried for a bit it would go on for days…
  • We passed through the same sigmoid again. Then there’s a massive discontinuous spike in the 40s. Lots of people very tightly bunched together, avoiding the slightly risky 49/50 positions.
  • And then we stopped at 49, we know not why. Held our breath (as in the opening quote). There was then a message about a change in the auction rules that would mean it definitely ended at the end of the day (more on that below). And after that there were no more bids.

We ended up in a good place. Mind you, I feel like it is also a fair place for everyone. In another auction we ended up 23rd, and from the above I’m pretty sure the difference from being 49th was only about £20 difference, perhaps less due to the higher densities in the high 40s.

There’s a complexity to the auction I haven’t mentioned yet.

Every time somebody bids, the countdown clock resets to 5 minutes. It’s as if you’re in Run Lola Run, or more like it Groundhog Day. If nobody bids for 5 minutes (300 seconds), the auction ends. If someone bids… The clock resets.

This graph is of how long people waited before bidding.

The data isn’t as accurate as our ranking changes above, as we were sometimes too time pressured to type it in. We hadn’t made a scraper, as we didn’t want to risk doing any harm to the auction process or software.

Time waited in auction

This graph is hard to interpret, but clearly does have information content. One important thing to know is that there were several auctions going on at once, some of which we weren’t even signed up to. The clock would reset if anyone bid in any of them.

This is the story:

  • The high values are where everyone held their nerve, and for a brief period nobody bid. To start with that was rarely for more than a minute, later it got longer.
  • The reason it is so spiky is this. There was a regular pattern throughout where the clock would tick down a long way (a minute early on, 4 minutes later on), then somebody would bid. As a result, we assumed, of the ranking changes that caused, there was then a flurry of quite fast bidding.
  • There’s a patch starting about 13:15 where the wait broke 200 seconds for the first time, and everything slowed down for a while. I wonder if that was the same as between the two bumps in our ranking graph, or if it is some random effect of the multiple auctions combining.
  • In the last part of the day, a minority of bidders were waiting until there was less than 30 seconds left and then bidding. In theory this gives you a tiny bit more information. In practice it is tedious and risky – what happens if your Internet connection breaks?

I was hoping to spot trends and patterns to be able to estimate, for example, when the auction would end.

This was pre-empted by the late afternoon rules change – they said they were going to get rid of the 5 minute resetting element at 5pm, and declared that the auction would definitely end at 6pm. This is quite a different beast, arguably something where everyone does a sealed bid at the end.

Whatever way, after 2pm very few people were bidding at all. In theory one bid could cause the whole pack to shuffle round again. In practice, the rates of bidding seemed to decay and it never started up again.

So there, it is, inside red-in-tooth-and-claw-capitalism competing to help deliver, for a Government that (touch wood) is finally getting it, to deliver user-centric public services.

I raise a standard deviation to that! Cheers!

If you work for a UK government agency, looking for help with data migration, analysis or visualisation, our place in the top 50 means you can get in touch now to start work.

]]>
https://blog.scraperwiki.com/2013/11/live-graphing-the-uk-governments-agile-auction/feed/ 2 758220192