Hi! We've renamed ScraperWiki.
The product is now QuickCode and the company is The Sensible Code Company.


100 Years of history…and I just hope that we do it justice…

Journalism School Columbia University

Columbia University, arguably the best Journalism school in the world is giving us the opportunity of a lifetime. We are hosting our first ever US event (Journalism Data Camp #jdcny)  and its their first hackathon in a proud 100 year history. It is scary but very exciting! The campus is lovely and welcoming.  Located in uptown Manhattan just above Central Park, the ‘J’ School building is one of the oldest in the complex.

Joseph Pulitzer

The entrance hallway has a bust of Joseph Pulizer the Hungarian-American Publisher who established Columbia as the world’s first school of Journalism.  We also know him for the Pulitzer Prize which is synonymous with excellence in journalism and the arts since 1917.

Our event is designed as an attempt to marry highly skilled journalists with data scientists, coders, statisticians and technology!  As he was such an innovator we would hope that Joseph would have approved! Our mission is to ‘Liberate Data’ to allow the professionals to hold power and money to account and to perform the exemplary role of being guardian of freedom of speech and to ask hard questions.

Emily Bell and Francis Irving will do a short introduction on their thoughts on digital journalism and the world of data.

Agenda…here it is…

…this is a fairly crude cropped PDF which we created with the ScraperWiki cropper tool and  it works beautifully.

So what exactly are we going to do tomorrow  and Saturday?  We think that we have packed the event with stuff!

Project Data Derby

The project data derby is where people will work together in teams to create ‘data driven’ stories and applications. This is a facilitated session where people will learn and understand the various techniques and skills to work with data. We will look to have multiple skill sets in a team and they will be encouraged to follow a process that will give the best outcome at the end of the two days!

Liberate the Data

We have been asking people to nominate data sets for the past few weeks and we already have a list!  We will put these on index cards and ask our ‘Data Liberators’ to dig up the data, get it into a structured format and publish it for the project teams and the world to see and reuse!

Learn to Scrape

We are also running two three hour tutorials on Friday morning (Python) and afternoon (Ruby).  Our chief data scientist Julian Todd and Thomas Levine data advocate, will run the sessions with the assistance of Michelle Koeth (Code for America Fellow) supporting and assisting the students.   They will cover things like identifying good targets for webscraping.  Navigating the complexity of different types of web pages.  Attendees will create their own scrapers to get and analyse the Department of Labour’s Unionreports.gov (Collective bargaining agreement listings).  The objective will be to get the data into a structured format, and join it with data from the US census in order to establish the number and order of union employees across the US by state.  If time allows we will also try to encourage people to do further analysis.

We will have prizes for the best projects, most daring data liberation and the craziest constructed data scrapers.   Our judges will be Aron Pilhofer from the New York Times and Susan E McGregor Assistant Professor of Journalism at Columbia.

And there’s more…

Hear from the pros!  On Friday and Saturday we are running some lunchtime lightening sessions in the plenary room and you will hear from Tom Lee from the Sunlight Foundation who will talk about a joint project between ScraperWiki and Sunlight and Jake PorwayData without Borders who will talk about some of the fabulous projects that they are currently running.

A big big thanks to Tahiat Mahboob and Sam Guzik our two Digital Media Associates at Columbia University Graduate School of Journalism who have very generously offered their time to film the event!  Hats off to the facilities team at Columbia for all their help with the logistics.

We look forward to seeing everyone who has signed up and a million thanks for supporting us.


  1. Big fat aspx pages for thin data | ScraperWiki Data Blog - February 7, 2012

    […] Go to ScraperWiki.com → ← 100 Years of history…and I just hope that we do it justice… […]

  2. Meet us in St Louis! | ScraperWiki Data Blog - February 19, 2012

    […] kick off the evening at 18:00 with some nibbles and light refreshments and then we will commence a ‘Learn to Scrape’ session at 19:00 for about 1.5/2 hrs – this will be similar to the event that we ran in […]

We're hiring!