Events – ScraperWiki Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 58264007 Open Data Camp 2 Tue, 13 Oct 2015 15:38:20 +0000 Open Data Camp 2I’m back from Open Data Camp 2; and I’m finding it difficult to make a coherent whole of it all.

Perhaps it’s the nature of the lack of structure of an un-conference. Maybe the different stakeholders in the open data community throughout the various hierarchies have a common aim but different levers to pull: the minister with the will to make changes; the digital civil servants with great expectations but not great budgets; the hacker who tries to consume the open data in their spare time, and creates new standards and systems for data in their day job.

There seemed to be a few themes which echoed through the conference:


There’s a recognition that improving people’s skills and recognising the skills people already have is critical, whether it’s people crafting Linked Data in Microsoft Word and wondering why it doesn’t work or getting local authorities to search internally for their invisible data ninjas.

Sometimes those difficulties occur due to differences in the assumed culture for different types of data — it seems everyone working in GIS would know what was meant by an .asc file and how to process it, but this information isn’t obvious to someone fresh to the data. Is there a need for improved documentation; linked to from Data sets? Or the ability to ask questions of other people interested in the same datasets about interpretation and processing in comments?


How do you know if your data is useful to people? Blogs have a useful feature called pingback – the referencing blog sends a message to the linked blog to let them know they’ve been linked to. There was quite a bit of discussion as to whether this functionality would be useful: particularly for informing people if breaking changes to the data might occur.

Also, when data sits around not being used, people don’t notice problems with it. When things break noisily and publicly — like taking down a cafeteria’s menu system — it’s a bit embarrassing, but it does get the problem fixed quickly!

Core Reference Data

One of the highlights of the weekend was a talk on the Address Wars — the financial value of addresses and the fight to monetise them and their locations, the problems caused for the 2001 census as a result of not being able to afford a product from the Royal Mail and Ordnance Survey, both of which were wholly government owned at the time.

It highlighted how much core reference data — lists of names and IDs of things — is critical as the glue which allows different data to be joined and understood. Apparently there’s 20 different definitions of ‘Scotland’ and 13 different ways of encoding gender (almost all of which are male or female). There’s no definitive list of hospitals, and seven people claim to be in charge of the canonical list of business names and addresses. Hence there’s a big push from GDS at the moment to create single canonical registers.

But there’s other items that need standardised encodings. The DCLG have been working on standardised reasons for why bins don’t get emptied – one of the most common interactions people have with their council. There’s a lot more work to be done across the myriad things government does, and it’s not quite clear where it should be happening: councils are looking to leadership from central government, central government wants councils to work together on it, possibly with the Local Government Association. This only gets more complicated when dealing with devolved matters or finding appropriate international standards to use.

Meeting people

I’m also really happy to have met Chris Gutteridge who was showing off some of the things he’s been working on. brings together equipment held by various UK universities in a federated discoverable fashion by making use of well-known URLs to point to well-formatted data on each individual website. So each organisation is in control of their data and is the authoritative source for it, and builds upon having a singular place to start discovering linked data about an organisation. It’s the first time I’ve actually seen linked data in the wild joining across the web like Tim Berners-Lee intended!

On a more frivolous level, using the OpenDefra LIDAR data to turn Ventnor sea-front into a Minecraft level is inspired, and the hand-crafted version looks stunning as well!

The Royal Statistical Society Conference–Exeter 2015 Fri, 11 Sep 2015 10:23:55 +0000 rss-exeterScraperWiki have been off to the Royal Statistical Society Conference in Exeter to discuss our wares with the delegates. The conference was very friendly with senior RSS staff coming to see how we were doing through the week.

We shared the exhibitor space in the fine atrium of The Forum at Exeter University with Wiley, the Oxford University Press, the Cambridge University Press, the ONS, ATASS sports, Phastar, Taylor and Francis and DataKind, alongside the Royal Statistical Society’s own stand.

We talked to a wide range of people, some with whom we have done business already such as the Q-step programme, and the people from the Lancaster University Data Science MSc. We had interns from these two programmes over the summer. We’ve also done business with the ONS, who were there both as delegates and to try out the new ONS website on an expert audience. Other people we had met before on twitter, such as Scott Keir – the Head of education and statistical literacy at the Royal Statistical Society – and Kathyrn Torney, who won a data journalism award for her work on suicide in Northern Ireland.

Other people just dropped by for a chat, our ScraperWiki stickers are very popular with the parents of young children!

Our story is that we productionise the analysis and presentation of data. is where the story starts, with a tool which accurately extracts the tabular content of PDF to Excel spreadsheets as an online service (and an API). DataBaker can then be used to convert an Excel spreadsheet in human-readable from, with boilerplate, pretty headings and so forth into a format more amenable to onward machine processing. DataBaker is a tool we developed for the ONS to help it transform the output of some of its dataflows where re-engineering the original software required more money or will than was available. The process is driven by short recipes which we trained the staff at the ONS to write, obviously we can write them for clients if preferred. The final stage is data presentation, and here we use an example from our contract software development: the Civil Service People Survey website. ORC International are contracted to run the actual survey, asking the questions and collating the results. For the 2014 survey the Civil Service took us on to provide the data exploration tool to be used by survey managers and, ultimately, all Civil Servants. The website uses the new GOV.UK styling and through in-memory processing is able to provide very flexible querying, very fast.

I’ve frequently been an academic delegate to conferences, this was my first time as an exhibitor at a conference. I have to say I commend the experience to my former academic colleagues! As an exhibitor I had a chair, a table for my lunch, and people came and talked to us about what we did with little prompting. Furthermore, I did not experience the dread pressure of trying to work out which of multiple parallel sessions I should attend!

As it was Aine and I went to a number of presentations, including Dame Julia Slingo’s one on uncertainty in climate and weather prediction, Andrew Hudson-Smith’s talk on urban informatics and Scott Zeger’s talk on Statistical Problems in Individualized health.

We liked Exeter, and the conference venue at the University. It was a short walk, up an admittedly steep hill, from the railway station and another short walk into town. The opening reception was held in the Royal Albert Memorial Museum, which is a very fine venue.

I joined the Royal Statistical Society having decided that they were were the closest thing data scientists had to a professional body in the UK, and in general they seemed like My Sort of People!

All in all a very interesting and worthwhile trip, we hope to continue and strengthen our relationship with the Royal Statistical Society and its members.

Summary – Big Data Value Association June Summit (Madrid) Tue, 21 Jul 2015 10:13:39 +0000 Summit Programme

In late June, 375 Europeans + 1 attended the Big Data Value Association (BVDA) Summit in Madrid. The BVDA is the private part of the Big Data Public Private Partnership.  The Public part is the European Commission.  The delivery mechanism is Horizon 2020 and €500m funding . The PPP commenced in 2015 and runs to 2020.

Whilst the conference title included the word ‘BIG’, the content did not discriminate.  The programme was designed to focus on concrete outcomes. A key instrument of the PPP is the concept of a ‘lighthouse’ project.  The summit had arranged tracks that focused on identifying such projects; large scale and within candidate areas like manufacturing, personalised medicine and energy.

What proved most valuable was meeting the European corporate representatives who ran the vertical market streams.  Telcom Italia, Orange and Nokia shared a platform to discuss their sector. Philips drove a discussion around health and well being.  Jesus Ruiz, Director of Open Innovation in Santander Bank Corporate Technology, led the Finance industry track. He tried to get people to think about ‘innovation’ in the layer above traditional banking services. I suspect he meant in the space where companies like Transferwise (cheaper foreign currency conversion) play. These services improve the speed and reduce the cost of transactions.  However the innovating company never ‘owns’ an individual or corporate bank account.  As a consequence they’re not subject to tight financial regulation. It’s probably obvious to most but I was unaware of the distinction.

I had an opportunity to talk to many people from the influential Fraunhofer Institute!  It’s an ‘applied research’ organisation and a significant contributor to Deutschland’s manufacturing success.  Last year it had a revenue stream of €2b.  It was seriously engaged at the event and is active at finding leading edge ‘lighthouse projects’.  We’re in the transport #TIMON consortia with it – Happy Days 🙂

BDVA - You can join!

BDVA – You can join!

Networking is the big bonus at events like these and with representatives from 28 countries and delegates from Palestine and Israel – there were many people to meet.  The UK was poorly represented and ScraperWiki was the only UK technology company showing it’s wares.  It was a shame given the UK’s torching carrying when it comes to data.  Maurizio Pilu, @Maurizio_Pilu Executive Director, Collaborative R&D at Digital Catapult gave a keynote.  The ODI is mentioned in the PPP Factsheet which is good.

There was a strong sense that the PPP initiative is looking to the long term, and that some of the harder problems have not yet been addressed to extract ‘value’.  There was also an acknowledgement of the importance of standards and a track was run by Phil Archer, Data Activity Lead the W3C .

Stuart Campbell, Director, CEO at Information Catalyst and a professional pan-European team managed the proceedings and it all worked beautifully.  We’re in FP7 and Horizon 2020 consortia so we decided to sponsor and actively support #BDVASummit.  I’m glad we did!

The next big event is the European Data Forum in Luxembourg (16-17 Nov 2015).  We’re sponsoring it and we’ll talk about our data science work, and DataBaker.   The event will be opened by Jean-Claude Juncker President of the EU, and Günther Oettinger , European Commissioner for Digital Economy and society.

It’s seems a shame that the mainstream media in the UK focuses so heavily on subjects like #Grexit and #Brexit.  Maybe they could devote some of their column inches to the companies and academics that are making a very significant commitment to finding products and services that make the EU more competitive and also a better place to work and to live.

Spreadsheets are code: EuSpRIG conference. Thu, 16 Jul 2015 09:35:28 +0000 EuSpRIG logo

I’m back from presenting a talk on DataBaker at the EuSpRIG conference. It’s amazing to see a completely different world of how people use Excel – I’ve been busy tearing the data out of spreadsheets for the Office of National Statistics and using macros to open PDF files in Excel directly using PDFTables. So whilst I’ve been thinking of spreadsheets as sources of raw data, it’s easy to forget how everyone else uses spreadsheets. The conference reminded me particularly of one simple fact about spreadsheets that often gets ignored:

Spreadsheets are code.

And spreadsheets are a way of writing code which hasn’t substantially changed since the days of Visicalc in 1978 (the same year as the book which defined the C programming language came out).

Programming languages have changed enormously in this time, promoting higher-level concepts like object orientation, whilst the core of the spreadsheet has remained the same. Certainly, there’s a surprising number of new features in Excel, but few of these help with the core tasks of programming within the spreadsheet.

Structure and style are important: it’s easy to write code which is a nightmare to read. Paul Mireault spoke of his methodology for reducing the complexity of spreadsheets by adhering to a strict set of rules involving copious use of array formulae and named ranges. It also involves working out your model before you start work in Excel, which culminates in a table of parameters, intermediate equations, and outputs.

And at this point I’m silently screaming: STOP! You’re done! You’ve got code!

Sure, there’s the small task of identifying which of these formulae are global, and which are regional and adding appropriate markup; but at this stage the hard work is done; converting that into your language of choice (including Excel) should be straightforward. Excel makes this process overly complicated, but at least Paul’s approach gives clear instructions on how best to handle this conversion (although his use of named ranges is as contentious as your choice of football team or, for programmers, editor.)

Tom Grossman’s talk on reusable spreadsheet code was a cry for help: is there a way of being able to reuse components in a reliable way? But Excel hampers us at every turn.

We can copy and paste cells, but there is so much magic involved. We’re implicitly writing formulae of the form “the cell three to the left” — but we never explicitly say that: instead we read a reference to G3 in cell J3. And we can’t easily replace these implicit references if we’re copy-pasting formulae snippets; we need to be pasting into exactly the right cell in the spreadsheet.

In most programming languages, we know exactly what we’ll get when we copy-and-paste within our source code: a character-by-character identical copy. But copy-and-paste programming is considered a bad ‘smell’: we should be writing reusable functions: but without stepping into the realm of macros each individual invocation of what would be a function needs to be a separate set of cells. There are possibilities of making this work with custom macro functions or plugins – but so many people can’t use spreadsheets containing macros or won’t have installed those plugins. It’s a feature missing from the very core of Excel which makes it so much more difficult and longwinded to work in it.

Not having these abstractions leads to errors. Ray Panko spoke of the errors we never see; how base error rates of a few percent are endemic across all fields of human endeavour. These error rates are at the time of writing the code the first time, and per instruction. We can hope to reduce these error rates through testing, peer review and pairing. Excel hinders testing and promotes copy-paste repetition, increasing the number of operations and the potential for errors. Improving code reuse would also help enormously: the easiest code to test is the code that isn’t there.

A big chunk of the problem is that people think about Excel the same wrong way they think about Word. In Word, it’s not a major problem, so long as you don’t need to edit the document: so long as it looks correct, that might be good enough, even if the slightest change breaks the formatting. That’s simply not true of spreadsheets where a number can look right but be entirely wrong.

Maria Csernoch’s presentation of Sprego – Spreadsheet Lego – described an approach for teaching programming through spreadsheets which is designed to get people thinking about solving the problems they face methodically, from the inside out, rather than repeatedly trying ‘Trial-and-Error Wizard-based’ approach with minimal understanding.

It’s interesting to note the widespread use of array formulae across a number of the talks – if you’re making spreadsheets and you don’t know about them, it might be worth spending a while learning about them!

In short, Excel is broken. And I strongly suspect it can’t be fixed. Yet it’s ubiquitous and business critical. We need to reinvent the wheel and change all four whilst the car is driving down the motorway — and I don’t know how to do that…

]]> 3 758223366
NewsReader World Cup Hack Day Tue, 29 Jul 2014 14:07:08 +0000 PiekPresents

Piek Vossen describing the NewsReader project

A long time ago*, in a galaxy far, far away** we ran the NewsReader World Cup Hack Day.

*Actually it was on the 10th June .

**It was in the Westminster Hub, London.

NewsReader is a EU FP7 project aimed at developing natural language processing and Semantic Web technology to make sense of large streams of news articles. In NewsReader the underlying events and actors are called “instances”, whilst the mentions of those events in news articles are called… “mentions”. Many mentions can refer to one instance. This is key to the new technology that NewsReader is developing: condensing down millions of news articles into a network representing the original events.

The project has been running for about 18 months, so we are half way through. We’d always planned to use Hack Days to showcase the technology we have been building and to guide our future work. The World Cup Hack Day was the first of these events. We wanted to base the Hack Day around some timely news of a manageable size. The football World Cup fitted the bill. Currently the NewsReader technology works in a batch process so in the couple of months before the Hack Day we processed approximately 300,000 news articles relating to the World Cup. At ScraperWiki we made a Simple API to provide access to the data that the NewsReader technology outputs. We thought this necessary because the raw output is stored in an RDF triplestore and is accessed using SPARQL queries. SPARQL is a query language similar to SQL but is not as widely known, we didn’t want people to spend all day trying to get their first SPARQL query to work. Secondly, the dataset the NewsReader technology generates is several hundred million triples so even a “correct” query could easily cause the the SPARQL endpoint to appear unresponsive. By making the Simple API we could limit the queries made, and make sure that they were queries which ran in a reasonable time.

About 40 people turned up on the day; a big contingent from the BBC, a smattering of people from various organisations based in London and the core NewsReader team, enhanced with various colleagues we’d dragged along. After a brief introduction from Piek Vossen, who put some context around the Hack Day, and me – providing some technical details – the participants went into action.

They made a range of applications, most of them focused around extracting particular types of events, as defined by FrameNet. Gambling, fighting and commerce were common themes.  A rogue group from ScraperWiki ignored the Simple API, bought a 32 CPU with 60GB memory spot instance and wrote their own high performance search tool.

At the end of the day the participants presented their work, and prizes were awarded. Jim Johnson-Rollings from the BBC came first with a live demo of a tool which worked out which team a named football player had played for the course of his career. Second, was Team Fighty who built a tool to discover which football teams were most commonly associated with violence. Team Fail Fast, Fail Often tried out a wide range of things culminating in a swish Gephi network visualisation. The prizes were regional foods from around the NewsReader consortium.

This was my first Hack Day, although ScraperWiki has run a number of such events in the past. We’d spent considerable effort in making the Simple API and it was great to see the hackers make such varied use of it. I was impressed how much they managed to do in such a short time.

The Simple API we developed, the presentation slides and some demo visualisations are available in this bundle of links.

The NewsReader team are grateful to all the participants for taking part and helping us with our project.

A big thanks also to the team at The Hub Westminster who gave us so much support on the day.



Happy Hackers working away on the World Cup data at the Westminster Hub

]]> 1 758222106
getting,stuff,done Tue, 22 Jul 2014 15:29:42 +0000 csv,confIn Berlin last week, a bunch of interoperability geeks gathered for the first csv,conf. Yes, that’s right, comma-separated value files.

The conference was about getting stuff done. Data in, data out… With an ironic self-recognition that CSVs are weak in lots of ways, but still the best we’ve got.

To give you a taste, here are a few of the speakers:

Jeni Tennison talked about the CSV on the Web Working Group. How can we make a CSV file slightly more useful like a database – types for columns, lightweight standards for metadata?

Dragon from ScraperWiki pitched xypath, his Python library for navigating around a grid of cells to turn a spreadsheet into data. You can read our blog post about it.

Felienne Hermans said that spreadsheets are really code. So why don’t we do proper software engineering on them? Unit tests, lint checking and coding standards are all possible in Excel. You can find more on her blog.

Also… Friedrich Lindenberg on precomputing being easier for data-rich webcsv,conf logosites, Peter Murray-Rust on data mining scientific papers, Ingrid Burrington on her field guide to the hidden infrastructure of our data centres, and Brian Jacobs on the state of collaborative knowledge platforms like DBPedia and Freebase.

And many more that we couldn’t go to because there were three streams!

Overall csv,conf was a refreshing change of honesty and action in data. We hear it might be coming to the UK next.

Further reading: Jeni’s blog post 2014: The Year of CSV

World Cup Hack Day, London 10th June – a teaser! Wed, 04 Jun 2014 07:30:24 +0000 With the England team just arrived in Miami for their final preparations for the World Cup, Mohammed Bin Hammam is back in the news for further accusations of corruption.

This is interesting because we saw Hammam’s name on Friday as we were testing out the NewsReader technology in preparation for our Hack Day in London on Tuesday 10th June. NewsReader is an EU project which aims to improve our tools to analyse the news.

And that’s just what it does.

Somewhat shame-faced we must admit that we are somewhat ignorant of the comings and goings of football. However, this ignorance illustrates the power of the NewsReader nicely. We used our simplifed API to the Newsreader technology to search thousands of documents relating to the World Cup. In particular we looked for Sepp Blatter and David Beckham in the news, and who else was likely to appear in events with them. The result of this search can be seen in the chart below. Which shows that Mohammed Bin Hammam appears very frequently in events with Sepp Blatter. Actors

For us soccer ignoramuses, the simple API also provides a brief biography of bin Hammam from wikipedia. Part of the NewsReader technology is to link mentions of names to known individuals and thus wider data about them. We can make a timeline of events involving bin Hammam, which we show below.


It’s easy to look at the underlying articles behind these events, and discover that bin Hammam’s previous appearances in the news have related to bribery.

Finally, we used Gephi to generate a pretty, but somewhat cryptic visualisation. beckham_and_blatter The circles represent people we found in the news articles who appeared in events with either Sepp Blatter or David Beckham, they are the purple dots from which many lines emanate. The purple circles represent people who have had interactions with one or other of Blatter or Beckham, the green circles that have had interactions with both. The size of the circle represents the number of interactions. Bin Hammam appears as the biggest green circle.

You can see an interactive version of the first two visualisations here, and the third one is here.

That’s a little demonstration of what can be done with the Newsreader technology, just imagine what someone with a bit more footballing knowledge could do!

If you want to join the fun, then we are running a Hack Day at the Westminister Hub in central London on Tuesday 10th June where you will be able to try out the NewsReader technology for yourself.

It’s free and you can sign up here, on EventBrite: EventBrite Sign Up

]]> 1 758221830
NewsReader – Hack 100,000 World Cup Articles Wed, 16 Apr 2014 13:52:23 +0000 NWR_logo_narrowJune 10, The Hub Westminster (@NewsReader)

Ian Hopkinson has been telling you about our role in the NewsReader project.  We’re making a thing that crunches large volumes of news articles.  We’re combining natural language processing and semantic web technology.  It’s an FP7 project so we’re working with a bunch of partners across Europe.

We’re 18 months into the project and we have something to show off.  Please think about joining us for a fun ‘hack’ event on June 10th in London at  ‘The Hub’, Westminster.  There are 100,000 World Cup news articles we need to crunch and we hope to dig out some new insights from a cacophony of digital noise.  There will be light refreshments throughout the day.  Like all good hack events there will be an end of day reception and we would like you to present your findings and give us some feedback on the experience. (the requisite beer and pizza will be provided)

All of our partners will be there LexisNexis, SynerScope, VU University (Amsterdam), University of the Basque Country (San Sebastian) and Fondazione Bruno Kessler (Trento).  They’re a great team, very knowledgeable in this field, and they love what they are doing.

Ian recently made a short video about the project which is a useful introduction.

If you are a journalist, an editor, a linked data enthusiast or data professional we hope you will care about this kind of innovation.

Please sign up here  ‘NewsReader eventbrite invitation’  and tell your friends.

logo long (for screen 72dpi)

Data Science (and ScraperWiki) comes to the Cabinet Office Thu, 05 Dec 2013 09:40:25 +0000 The Cabinet Office is one of the most vital institutions in British government, acting as the backbone to all decision making and supporting the Prime Minister and Deputy Prime Minister in their running of the the United Kingdom. On the 19th of November, I was given an opportunity to attend an event run by this important institution where I would be mentoring young people from across London in Data Science and Open Data.


The event was held at the headquarters of the UK Treasury, which occupies a palatial corner of Westminster overlooking St James’s Park, just a stone throw away from Buckingham Palace. Also in attendance were programmers, data scientists, project managers and statisticians from the likes of BT, Experian,, the Department for Education and the Foreign and Commonwealth Office, as well as my colleague Aine McGuire from ScraperWiki.

After a spot of chatting and ‘getting to know you’, the mentors and mentees split off into small groups where they’d start working on interesting ways they could use open government data; in particular data from the Department for Education.

Despite only having a day to work on their projects, each of the teams produced something incredible. Here’s what they made:


Students from a sixth-form college in Hammersmith and from the University in Greenwich chose to put together mapping technologies and open data to make it easy for parents to find good schools in their area.


They even managed to create a tablet-ready demonstration product built using Unity 3D, which displayed a number of schools in England and Wales, with data about the academic performance of the school being displayed opposite. Despite the crippling time constraints of the day, they managed to create something that worked quite well and ended up winning the award for ‘best use of Open Data’.


In British parlance, NEET is someone who is Not in Education, Employment or Training. It’s a huge problem in the UK, wasting huge amounts of human potential and money.


But what if you could use Open Data to make it inspire young people to challenge themselves and take advantage opportunities related to their interests? And what if that came packaged in a nice, accessible phone app? That’s what one of the teams in attendance did, resulting in Neetx.

Cherry Picker

The explosion in speciality colleges (confusingly, these are almost all high-schools) under the Labor government has made it easy for pupils with very specific interests to choose a school that works for them.

But what if you wanted a bit more detail? What if you wanted to send your child to a school that was really, really good at sciences? What if you wanted to cherry pick (see what I did there?) schools based upon their performance based upon their performance in certain key areas? Cherry Picker makes it easy to do just that.

University Aggregator

Finding the right university can be hard. There’s so much to be taken into consideration, and there’s so much information out there. What if someone gathered it all, and merged it into a single source where parents and prospective students could make an informed decision?


That’s what one of the teams attending proposed. They suggested that in addition to information from the National Student Survey and government data, they could also use information from Which?, The Telegraph and The Guardian’s university league tables. This idea also got a great reception from the mentors and judges in attendance, and is one idea I would love to see become a reality.


I left the Cabinet Office impressed with the quality of the mentorship offered, the quality of the ideas given as well as the calibre of the students attending. The Cabinet Office really ought to be commended for putting on such an amazing event.

Were you in attendance? Let me know what you thought about it in the comments box below.

Open data – the zeitgest Fri, 15 Nov 2013 13:04:37 +0000

Nirvana_by_31337157Open data [1] is becoming a brand – 61 countries are using the brand and many others are expressing interest. The week before last thousands of delegates from around the world descended on London for a host of open data events that ran throughout the week. There is something of the zeitgeist about open data at the moment and this is important as it is –

  • becoming a magnet for digital talent
  • a driver for a host of new start-ups
  • pressurising existing businesses to up their game
  • and engendering a positive feeling about what can be done with technology to make a difference to how citizens are served by government

Monday – Github in Government

@FutureGov is a business that provides ‘digital public service design’. Dominic Campbell (@dominiccampbell) hosted an excellent ‘Github in Government’ event at the trendy Shoreditch Works and which included a number of short presentations by Paul Downey (@psd), Chris Thorpe (@jaggeree), Sarah Kendrew (@sarahkendrew), and James Smith (@floppy) amongst others.  It was a community event that managed to attract a much wider audience from the local digital crowd.

ODI Summit 2013- thanks to Brendan Lea - used under CC SA

ODI Summit 2013- thanks to Brendan Lea – used under CC SA

Tuesday – ODI Annual Summit
On Tuesday @UKODI held its annual summit at the London Museum to celebrate its 1st birthday. Sir Tim Berners Lee, Sir Nigel Shadbolt and Gavin Starks were the main hosts and kept up the pace throughout the day with numerous talks and panels. The main theme was the use of open data to build and support business and a range of exemplars were used to enforce the theme. Placr’s Transport API, Mastadon C, and OpenCorporates were just a few of those mentioned. ScraperWiki is a proud ODI supporter.

Business is important in this context as open data needs to be used and reused to be sustainable in the longer term. It also needs to be embraced more broadly across corporate business. I recently heard a senior executive from a large US social media company propose that government organisations close what is ‘essential to close’ and publish everything else. His suggestion is daring and logical but likely politically unacceptable.

Kenneth Cukier (@kncukier) Data Editor at the Economist cited the UK as being the world leader in open data and emphasised the big opportunity that it presents for the country and its influence internationally. [2]Whilst we dined and networked informally with other delegates, a colourful curved aesthetic digital display hung over the main dining area and showed live data feeds and ODI acknowledgements.

Wednesday – OGP Civil Society Day
Organised by the OKFN – The Civil Society day provided an informal opportunity for over 400 civil society actors that are involved in OGP to connect, interact, learn and strategise.  It provided an opportunity to focus on the conversations that are needed between civil society in order to prepare for Thursday and Friday’s summit and to strengthen the national OGP processes for the future.

OGPSummitThursday and Friday – OGP Summit 2013
The Open Government Summit was the UK’s opportunity to showcase the work it has been doing around open data and its impact on transparency, business and civil society. A good friend suggested that the ‘metapoint’ e-Diplomacy initiative run largely by foreign offices of participating countries. Francis Maude was the main UK government host and he had a very professional team around him to ensure that the event was a success. We were in the ‘Festival’ area which was a showcase of companies providing technology that is supporting open government. The event also marked the UK’s handover of the biennial responsibility to Indonesia. We used the opportunity to show off our prototype of Table Xtract and we demonstrated it by converting energy prices tariffs from each of the energy companies EDF, British Gas and Scottish Power.

..and meanwhile back up on the 4th floor of the QEII centre
The US State Department ran a Tech Camp’. The purpose of a techcamp is to provide NGOs with training in low-cost or no-cost new and online technologies. ‘Speed geeking’ was used to introduce Ian Hopkinson from ScraperWiki to OGP delegates from participating countries every 5 mins – the objective was for him to give examples of successful open data projects – he showed the UN OCHA project. The US ambassador to London Mr Barzun was also treated to the 5 minute experience. We gave him a ScraperWiki sticker and found out he is a Liverpool fan.

I am frequently reminded that my start-up colleagues have been pushing the open data envelope for about 10 years – who says things happen quickly in our sector!Opengov

[1]Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.
[2] co-author of Big Data: A Revolution That Will Transform How We Live, Work, and Think with Viktor Mayer-Schönberger in 2013