Events – ScraperWiki Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 58264007 Software Archaeology and the ScraperWiki Data Challenge at #europython Fri, 29 Jun 2012 09:24:27 +0000 There’s a term in technical circles called “software archaeology” – it’s when you spend time studying and reverse-engineering badly documented code, to make it work, or make it better. Scraper writing involves a lot of this stuff. ScraperWiki’s data scientists are well accustomed with a bit of archaeology here and there.

But now, we want to do the same thing for the process of writing code-that-does-stuff-with-data. Data Science Archaeology, if you like. Most scrapers or visualisations are pretty self-explanatory (and an open platform like ScraperWiki makes interrogating and understanding other people’s code easier than ever). But working out why the code was written, why the visualisations were made, and who went to all that bother, is a little more difficult.

ScraperWiki Europython poster explaining the Data Challenge about European fishing boats

That’s why, this Summer, ScraperWiki’s on a quest to meet and collaborate with data science communities around the world. We’ve held journalism hack days in the US, and interviewed R statisticians from all over the place. And now, next week, Julian and Francis are heading out to Florence to meet the European Python community.

We want to know how Python programmers deal with data. What software environments do they use? Which functions? Which libraries? How much code is written to ‘get data’ and if it runs repeatedly? These people are geniuses, but for some reason nobody shouts about how they do what they do… Until now!

And, to coax the data science rock stars out of the woodwork, we’re setting a Data Challenge for you all…

In 2010 the BBC published the article about the ‘profound’ decline in fish stocks shown in UK Records. “Over-fishing,” they argued, “means UK trawlers have to work 17 times as hard for the same fish catch as 120 years ago.” The same thing is happening all across Europe, and it got us ScraperWikians wondering: how do the combined forces of legislation and overfishing affect trawler fleet numbers?

We want you to trawl (ba-dum-tsch) through this EU data set and work out which EU country is losing the most boats as fisherman strive to meet the EU policies and quotas. The data shows you stuff like each vessel’s license number, home port, maintenance history and transfer status, and a big “DES” if it’s been destroyed. We’ll be giving away a tasty prize to the most interesting exploration of the data – but most of all, we want to know how you found your answer, what tools you used, and what problems you overcame. So check it out!!


PS: #Europython’s going to be awesome, and if you’re not already signed up, you’re missing out. ScraperWiki is a startup sponsor for the event and we would like to thank the Europython organisers and specifically Lorenzo Mancini for his help in printing out a giant version of the picture above, ready for display at the Poster Session.

]]> 2 758217302
“the impact on our industry only begins this weekend” says Susan E McGregor, Professor at the world’s foremost school of journalism Wed, 01 Feb 2012 18:15:23 +0000 This is a guest blog post by Susan E. McGregor – Assistant Professor at the Tow Center for Digital Journalism Columbia University

The Tow Center for Digital Journalism at Columbia University Graduate School of Journalism is proud to be partnering with Knight News Challenge winner ScraperWiki this Friday and Saturday for their first Journalism Data Camp in the U.S. This event provides us with an opportunity to host a wide range of programmers, journalists and educators interested in expanding access to essential data sets, while connecting those communities to one another. We are also looking forward to extending the impact of this weekend’s activities by working in conjunction with our colleagues at the Stabile Center for Investigative Journalism and The New York World to further pursue those stories related to New York accountability issues that may be touched on during this weekend’s data “liberation” activities.

As an online tool, ScraperWiki is an innovative technical platform that allows users to build, test, and execute programmatic “scrapers” that transform web pages and pdfs into more accessible, usable data formats. As an online archive and repository, ScraperWiki helps improve access to scraped data sets by making them collectively available on their website. Finally, as a web-based collaboration space, ScraperWiki helps convene journalists and programmers around projects of shared interest, in addition to fostering peer-to-peer training and support.

Each of the above features of the ScraperWiki platform resonates closely with the Tow Center’s own priorities for data journalism. Making data available in formats that can be easily parsed, analyzed, and distributed is an essential part of data transparency, and the accountability journalism it serves. Providing a public access point for that data allows both journalists and their audiences to fact-check and elaborate upon the work that their peers have done, leveraging it against future projects and creating more comprehensive resources. And of course, the knowledge sharing and collaboration that takes place between programmers and journalists through ScraperWiki echoes the Tow Center’s mandate to educate and innovate at the intersection of computer science and journalism, both through its own dual-degree program in computer science and journalism, and through such public events as this one.

While we are certain that ScraperWiki will find ready adoption in cities and newsrooms throughout the country in the months to come, we look forward to growing an ongoing relationship with ScraperWiki and its contributors here in the New York area. By hosting this event we hope to introduce many of our students and colleagues to a truly remarkable tool, one whose impact on our industry only begins this weekend.

Announcing The Big Clean, Spring 2011 Wed, 10 Nov 2010 16:51:30 +0000 We’re very excited to announce that we’re helping to organise an international series of events to convert not-very-useful, unstructured, non-machine-readable sources of public information into nice clean structured data.

This will make it much easier for people to reuse the data, whether this is mixing it with other data sources (e.g. different sources of information about the area you live in) or creating new useful services based on the data (like TheyWorkForYou or Where Does My Money Go?). The series of events will be called The Big Clean, and will take place next spring, probably in March.

The idea was originally floated by Antti Poikola on the OKF’s international open-government list back in September, and since then we’ve been working closely with Antti and Jonathan Gray at OKFN to start planning the events.

Antti and Francis Irving (mySociety) will be running a session on this at the Open Government Data Camp on the 18-19th November in London. If you’d like to attend this session, please add your name to the following list:

If you can’t attend but you’re interested in helping to organise an event near you, please add your name/location to the following wiki page:

All planning discussions will take place on the open-government list!

Event: ScraperWiki/LJMU Open Labs Liverpool Hack Day – Hacks Meet Hackers! Tue, 22 Jun 2010 22:10:00 +0000

We’re happy to announce our next Hacks Meet Hackers event, to take place in Liverpool on Friday July 16, 2010 from 9.30am to 8pm at the Arts and Design Academy.

The *free* hack day, sponsored by LJMU Open Labs and Liverpool Daily Post & Liverpool Echo, is for both developers and journalists. For additional sponsorship opportunities please contact aine [at]

Can’t get to Liverpool? Don’t worry – we’ve got more UK hack days in the pipeline: get in touch to find out more about attending or sponsoring one.

So what’s this hack day all about?  It’s a practical event at which web developers and designers will pair up with journalists and bloggers to produce a number of projects and stories based on public data.

Who’s it for? We hope to attract hacks and hackers from all different types of backgrounds: people from big media organisations, as well as individual online publishers and freelancers.

What will you get out of it? The aim is to show journalists how to use programming and design techniques to create online news stories and features; and vice versa, to show programmers how to find, develop, and polish stories and features.

How much? NOTHING! It’s free, thanks to our sponsors.

What should participants bring? We would encourage people to come along with ideas for local ‘datasets’ that are of interest. In addition we will create a list of suggested data sets at the introduction on the morning of the event but flexibility is key for this event.

But what exactly will happen on the day itself? Armed with their laptops and WIFI, journalists and developers will be put into teams of around four to develop ideas, with the aim of finishing final projects that can be published and shared publicly. Each team will then present their project to the whole group. Overall winners will receive a prize at the end of the day. Food and drink will be provided during the day!

Any more questions? Please get in touch via aine[at]