Open Knowledge Foundation – ScraperWiki Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 58264007 Seize the Data Day with Open Knowledge Foundation Thu, 01 Dec 2011 18:35:48 +0000 On December 3rd at the Barbican Centre in London, the Open Knowledge Foundation will be inviting everyone and anyone who can wrangle up a good bit of data or wrangle the wranglers of data, to a Seize the Data event.

So they could do with some screen scraping, data extracting, coding extraordinaires (i.e. you).

So if you’re free and happen to be in London, please park your diggers outside their door and start turning the scraper cogs towards evil PDFs.

Sign up here.

]]> 1 758215932
Growing back to the Future: Allotments in the UK, open data stories and interventions Mon, 31 Oct 2011 15:34:16 +0000 IMGP4515This is a guest blog post from Farida Vis. She attended EuroHack at the Open Government Data Camp 2011. It consisted of a series of short talks combined with plenty of opportunities for hacking in groups in the second part the workshop.

On the day, we were given an introduction to data driven journalism by data journalist Nicolas Kayser-Brill, who has recently launched J++, a new media company that builds data journalism applications. Friedrich Lindenberg (OKF) and Aidan McGuire (ScraperWiki) gave a thorough overview of scraping, mainly focusing on the very popular ScraperWiki with Friedrich highlighting its application to EU spending data. Finally Chris Taggart (Open Corporates) talked about EU spending data as well as Open Corporates, and gave a hands-on workshop on Google Refine.

My personal interest lies in rather everyday data, related to ‘mundane issues’ that people relate to easily, principally because they feature in their everyday lives. This allows for a rethinking of political participation and civic engagement beyond the rather stale ways in which this is measured traditionally. I’m interested in what Liz Azyan has started calling ‘really useful’ data, which has the ordinary end user firmly in mind. Personally I find huge spending data difficult to get my head round (but I guess I’m not alone in that) and so I’m interested in exploring a more manageable example and seeing how far I can take it. So for some time now, I have been looking at the issue of allotments in the UK. At EuroHack I had not really intended to pitch my project, but having briefly talked about what I was doing to Aidan McGuire before the start of the workshop, he highlighted it on my behalf and then there was luckily no turning back. I was delighted with people’s interest in the project. Below highlights what we looked at on the day and what happened next.


What issue did we look at?

An allotment is a small plot of publicly owned land you rent from the council for a small annual fee, giving people the possibility to grow their own fruit and vegetables. I have an allotment myself (here’s a picture) and was lucky that when I decided to get one eleven years ago the waiting list was only two months, so my partner and I got one nearly immediately. Since then those numbers have shot up to the extent that on our site in South Manchester the waiting list is now fifteen years, highlighting a nationwide problem. The last few years have seen a staggering increase in demand, no doubt fuelled by growing broader environmental concern and awareness, yet no significant increase in the numbers of extra allotments have been created to meet this demand. The New Local Government Network reports that during the 1940s there were around 1.4 million allotments in the UK with only 200,000 today, which partly reflects that ‘growing your own’ goes through cycles of popularity. During a period of complete lack of interest, it is difficult for councils to hold on to this land as allotments that nobody wants. But what do you do when it seems everybody wants one again?

Earlier this year, the Department for Communities and Local Government issued a public consultation on 1294 Statutory Duties pertaining to local authorities to possibly reduce their number. These duties included Section 23 of the Allotments Act from 1908, which ensures local authorities provide allotments (and should take seriously such a request made by at least six tax paying citizens in a council), causing some newspapers to suggest that ‘The Good Life’ was now under threat. The Act remained unchanged however and this summer the government announced that of the 6,103 responses received, nearly half contained a comment on the Allotments Act, suggesting on a ‘straw poll’ level at least that this is an issue people care about.


What were we interested in?

Although it is tempting to simply highlight this problem in a different way, with additional data and accompanying visualisations, I was keen to highlight that whilst I do think there is an issue with councils not providing more sites, it is also clear to me that they are not exactly in a position to necessarily do so given the current economic climate. So therefore whatever we did, it was important to me that we used part of the day to start thinking about alternative solutions to the waiting list crisis. For example by identifying underused plots of lands (brown field sites and others), which could serve as temporary growing spaces (pop-up allotments anyone?). In my attempt to ‘do something about this’ I was joined by Daniela Silva and Pedro Markun from the Sao Paulo based think-and-do tank Esfera; data journalist Nicolas Kayser-Brill; python/js developer and self described open data fan Anna Powell-Smith, and finally Andrew Mackenzie who was at the OGD camp to film, part of an ongoing project that records the open data movement.

What data did we have?

Although there is very little allotment data available, as councils rarely publish it, Transition Town West Kirby (TTWK), led by Margaret and Ian Campbell, has for the last three years used the Freedom of Information Act, to obtain allotment waiting list data through WhatDoTheyKnow. They publish this data, along with a report each year and these figures are now widely used in the mainstream media. The reports however focus on national averages and do not highlight specific differences between councils or identify councils where problems are particularly severe. My co-researcher at Leicester, Yana Manyukhina and I had recently put in our own FOI request to build on the TTWK data. Our request focused on rental cost, water charges, whether discounts were available to plot holders. Aside from this we also requested the tenancy agreements councils use to manage their allotment sites. An analysis of these agreements may reveal further differences between councils, which could prove to be significant to citizens living in these locations. Because I am Manchester based, we also had a look at allotment location data manually collected by Feeding Manchester, which is interested in sustainable food for Greater Manchester.


What did we do?

After my introduction, Anna decided to work on the FOI data, using Google Fusion tables. In a UK context, The Guardian Data Store frequently uses these in order to highlight differences per council related to a specific topic. I had previously standardised the TTWK data so that each council now included a figure for how many people were waiting per every 100 allotments (the data set also includes further details about number of sites and allotments per council). Anna and I decided that we would add data from the FOI Yana and I had to the TTWK data, namely: the rental cost, water charges, and discounts given. I need to do further work on standardising the rent charge per council, which now is still expressed in a range of different old fashioned measurements. Allotment sizes were traditionally measured in ‘poles’ and ‘rods’ (from 1908 onwards a standard plot was 10 rods), though many now use square yards and metres.

Pedro and Nicolas both worked on building a series of scrapers, using ScraperWiki, scraping the Feeding Manchester data, Landshare data (Landshare is an initiative that is already offering alternatives, matching up individuals who have land, with those who wish to cultivate it) as well as a number of council sites. Aside from this Pedro and I also worked with an idea that ScraperWiki’s Julian Todd had given me at an earlier meeting (at OKCON in Berlin), and that is to use OpenStreetMap to get people to mark up allotments. In our extended idea (usefully articulated by Andrew Mackenzie on the day), other possible growing spaces, possibly with a newly agreed land use tag could also be mapped. In the end Pedro built a site that pulled in all the OSM data to show allotment sites in the UK and would update daily every time a new allotment was marked up on OpenStreetMap.

What happened next?

The enthusiasm and the great work we did during hackday meant that I wanted to reflect this in my presentation at the camp the next day. I addressed this desire to both highlight the issues over current allotment data collection (lack of ontologies), access to or knowledge of this data combined with this huge surge in demand from ordinary people wanted to grow their own produce. Going beyond simply a better visualisation of council data obtained via FOIs I strongly emphasised the possibility for a technological intervention into this growing (pardon the pun) issue, by building stronger ontologies for allotment data (Pedro and I talked about this a lot afterwards), but also to think beyond the unproductive ‘councils just need to provide more allotments’ deadlock. Following my presentation I had various offers from people keen to help out with the mapping, but one person on Twitter confirmed my feeling that in order to get a lot of people to map, to do this directly in OpenStreetMap was still quite a daunting prospect for the ordinary end user. I toyed with the idea of filming a simple step-by-step tutorial, but in the end Pedro suggested to use a new, more user friendly interface, one he is currently developing for the Sao Paulo Council in Brazil. This is currently still under development, but we will hopefully have an update soon.

Anna and I made excellent progress and had a great chat with Lisa Evans from the Guardian Data Store, at the camp to present, who expressed an interest in putting the allotment data on the date store. I will work with Anna over the next few days to complete the data set and do a short write up. Hopefully releasing this data through such a well known and respected site might generate some further interest. Daniela also interviewed me for the Esfera blog and she has written up our EuroHack day in Portuguese here.

All this flurry of activity did not go unnoticed and the project has now received official support from the OKF, with Community Coordinator Kat Braybrooke as the key liaison. Although Kat and I had talked for months about this project already, it seemed that it needed the critical mass, collective brainstorming and hacking at EuroHack and afterwards to push this open data part of the project to the next level. Kat and I will be meeting with a range of NGOs and interested parties soon, who have expressed an interest in pulling resources and making a joint intervention in to this problem. It is hard to express how exciting it was to connect with such amazing people at EuroHack, who all did such a tremendous amount of work on this project and especially to end up with such a great result. An OKF site highlighting the mapping project will launch shortly and we hope to give you further updates in the not-too-distant future. Watch this (growing) space!

If you would like to get involved or receive further info on the project, feel free to get in touch via email or twitter.

Farida Vis from the University of Leicester in the UK (where she teaches Media and Communication) recently took part in EuroHack, a pre-conference workshop in Warsaw, Poland, on 19 October, at the Open Government Data Camp 2011, organised by the European Journalism Centre and the Open Knowledge Foundation. Farida is very grateful to the EU Commission for supporting her attendance at EuroHack and the OGD Camp with a travel bursary. 

]]> 1 758215768
Diggers and Dinosaurs – Scraping at the Mozilla Festival Mon, 17 Oct 2011 15:40:33 +0000 In a complete paradigm shift of the epic battle between Godzilla and Mothra we are turning our backs on the old claymation medium and embracing the digital age where dinosaurs and diggers (yes, I am aware we are a machine and not a moth) can roam free across the lawless plains of web 2.0.

Both can be found at the Mozilla Festival park in London on 4-6 November. If you’re lucky you might even spot a wily firefox. There will be an inclosure on the Friday from 18:00 where our tamed digger driver, Francis Irving, can give you some driving lessons.

As part of the Data Journalism Workshop on the Saturday, 10:00-17:00, we’ll be hosting a ‘Scraping 101’ session. There will be a host of data trackers to guide you through the web wilderness including Open Knowledge Foundation‘s Jonathan Gray and the European Journalism Centre‘s Liliana Bounegru. There will be herds of other data/web beasts roaming the plains so we suggest you stay inside or close to your digger.

If you’re interested in a close encounter of the data kind sign up for the event here.

So watch out Mozilla Festival – you’re being ScraperWikied!

]]> 1 758215654
Scraping Government Data for the Open Government Data Camp Tue, 11 Oct 2011 16:23:40 +0000 Come one, come all and gather ye ’round the fantastical scraping table at the Open Government Data Camp at Warsaw. Here you will see such mythical beasts the Irish man with the gift of the gab and the German obsessed with numbers and efficiency he has become part database.

So head to Soho Factory in Warsaw, Poland on 19th October (register here) for a spectacular #EuroHack. There will be much merriment with prizes galore and jaw dropping performances from Nicolas Kayser-Bril (data journalism show) and Chris Taggart (who will beguile you with Google Refine). There will be magical wifi but please bring your own laptops as the wonderment will titillate your senses from 10:00 – 18:00.

So come and be amazed!

]]> 2 758215637
Scraping New Frontiers Mon, 10 Oct 2011 21:15:25 +0000 Today is Columbus Day in the US (yes, I’m working regardless). So I’ve decided to write a post about discovery. This has been my first full week in America. I have toiled Heathrow Terminal 5, battled through the baffling New York subway and scaled the mountains of food to find, well, not the promised land.

That’s because what’s promising about our financially collapsing, unemployment riddled and Mother-Earth-telling-us-to-get-lost future is not land. It’s the web. It’s the network we’re plugging ourself into and upon which we’re uploading pictures of burping pugs, frivolous commentary on our mundane lives and videos of children hurting themselves.

Yes, the sea of code encasing invaluable data is the chasm that must be crossed to reach the new frontiers of ‘Big Data‘. In that sense we’re looking to lead some expeditions in search of this promised land. In the US, we’re looking to roll out some big events next year but in the meantime we will be recruiting data navigators for sorties. We’ll be at the Open Government Data Camp from 20-21 October in Warsaw.

If you would like to be part of this brave new world then sign up here and follow our blog for further announcements!

Constructing the Open Data Landscape Wed, 07 Sep 2011 11:01:38 +0000 In an article in today’s Telegraph regarding Francis Maude’s Public Data Corporation, Michael Cross asks: “What makes the state think it can be at the cutting edge of the knowledge economy“. He writes in terms of market and business share, giving the example of the satnav market worth over $100bn a year yet it’s based on free data from the US Government’s GPS system.

He credits the internet revolution for transforming public sector data into ‘cashable proposition’. We, along with many other start-ups, foundations and civic coding groups, are part of this ‘geeky world’ of Open Data. So we’d like to add our piece concerning the Open Data movement.

Michael has the right to ask this question because there is this constant custodial battle being fought every day, every scrape and every script on the web for the rights to data. So let me tell you about the geeks’ take on Open Data.


The idea(l) behind Open Data is to create sustainable Open Data projects with purpose. This has been championed in the last couple of years by civic data projects such as MySociety, Open Knowledge Foundation, Code for America, Open AustraliaOpen Development Cambodia is following me on twitter! Older, more established organizations are also being converted to the Open Data ethos. For instance, The World Bank is one major organization turning to Open Data in a big way.

However, much of the public sector data published so far has been pretty much useless. Governments, finally, are beginning to realize that data has little value unless people understand its context and provenance. They are beginning to see that opening up their data can reduce the cost and responsibility of getting it to the end point user, as the Open Declaration on European Public Services clearly says: “The needs of today’s society are too complex to be met by government alone”.

The key to a sustainable Open Data landscape lies not in the organisational heads of government bodies but in the provenance of the data they release and the ways in which it is released. The goal should be to gain the 5 stars of open linked data. For this to be achieved the data needs to be pared down to its raw ingredients. In a research paper entitled “Open Data, Open Society” (see end of post) Marco Fioretti explains:

Public data are really useful only when they are raw, really open and linked … only when data are published online in that way every citizen or organization will be able to automatically analyze and present them in easy to understand forms

This is where ScraperWiki really excels in terms of opening up data. Not only is our data open and accessible through various processes (csv, database, API), even the extraction process is open in the form of a code wiki. In terms of data, we are rawer than raw. If government ordered an open data steak they would order rare, data hubs would order raw, ours would be mooing!

We’re providing some of the heavy machinery needed to construct the Open Data landscape. What it will look like very much depends on the civic cyber-community getting involved. A leader in this community is Chris Taggart, creator of OpenlyLocal and OpenCorporates, and a prolific ScraperWiki user. So I Skyped him to see what he makes of the state thinking it can be at the cutting edge of the knowledge economy:

Speaking of the linked economy, do check out all the links in this post and all the media included here is under Creative Commons license.

If you are interested in getting more involved in the Open Data scene check out the Open Knowledge Foundation.

Open Data, Open Society

]]> 1 758215331
Open Data Events – Meet like minded folk Fri, 02 Sep 2011 15:37:05 +0000 Just a quick shout out to upcoming Open Data events crying out for participation from well minded coding citizens like yourselves:

Open Australia and ScraperWiki Hackfest:

In sunny down under (Sydney) on 10-11 September. No travel bursaries but it is free! Sign up here. It’s a hacking extravaganza for anyone interested in helping liberate data or hacking on OpenAustralia projects. We look forward to following the tags. If you’re gonna go then sign up for an account now. If their mammals are anything to go by we’re hoping for some wacky uses of open data.

Open Government Data Camp:

This year held in Warsaw and taking place on 20-21 October. Run by the fabulous Open Knowledge Foundation it boasts two days of talks, code sprints and workshops. The week run up will have satellite events including a workshop from our fine selves so come along and tell us how we can make your scraping life better. Early bird tickets are still available so get them while they’re hot here. There are travel bursaries so check out those too. Get the full programme here.

Open Aid Data Conference and Hackday:

In Berlin, Germany on 28-29 September, this conference offers training on data sources of development cooperation and a hackday for programmers to develop application of aid data. Register here (PS: it’s in German but the event is in English). Check out the full programme here.

We hope you all can make it and help aid data, government data and Australia get ScraperWikied

We Eat Data – ScraperWiki talk at Open Knowledge Conference 2011 Mon, 04 Jul 2011 18:48:51 +0000 Our tamed computer programmer, ‘The Julian’, recently gave a rare appearance at the Open Knowledge Conference in Berlin (if you want an appearance pay us or ask us!). The spectacle of such scraping royalty drew more people than the room could accommodate (‘The Julian’ is not related to any royals living or deceased). As such I have included the slides here:

[googleapps domain=”docs” dir=”a/” query=”id=dcfvj9d_360fd4wwzc8″ width=”410″ height=”342″ /]

We were honoured to be amongst an outstanding line-up of speakers. We also ran a workshop the week of the conference and you can see the German data we scraped into ScraperWiki on the OKCon2011 tag.

What was most interesting about the workshop is that we see the same types of data needed for similar projects wherever we go. Tobias Escher wants to do something similar to AlphaGov for Germany called Meine Demokratie. A lot of very simple little scrapers can go a long way and if there’s anyone looking to play around with scraping and ScraperWiki, or who would like to lend a coding hand to a worthy cause please to click the above link.

‘The Julian’ was also looking for a scraping challenge and the workshop gnomes found Berlin schools data. I showed those in attendance one of my favourite sites made from scrapers:  Schooloscope. So Julian is scraping the data for Berlin schools in various stages and the hope is to get all the data for schools in Germany to make a German schooloscope.

We have one lovely lady very interested in getting this project on its way so if you are willing, if you speak German and if you know where to find them maybe you can scrape German schools data.

So watch out useful things to know in Germany including schools – you’re being ScraperWikied!

(As ScraperWiki is being used for better and better things, this will just get harder for me…)

]]> 2 758215103
Opening Data at Open Knowledge Conference Berlin Wed, 22 Jun 2011 13:15:57 +0000 If you haven’t already heard, the Open Knowledge Foundation is hosting a one week conference in Berlin next week. We’re very excited and will be running workshops on the Monday and Tuesday. We hope to meet some of our German users and opening data in the way we know best.

It’s an all-star line up of who’s who of open data including Chris Taggart, Rufus Pollack, Paul Bradshaw and our own, ‘The Julian’. Workshops running up to the talks cover Open Spending, CKAN and ScraperWiki. It’s an open data festival if ever there was and we’re heading to the continent to park our digger and invite you along for the ride. Just check out the packed programme!

So if you can make it, sign up for the conference here. If you’re in town, our workshop is free for anyone so get the details and the tickets here. We look forward to seeing you and your data!