Knight Foundation finance ScraperWiki for journalism

ScraperWiki is the place to work together on data, and it is particularly useful for journalism.

We are therefore very pleased to announce that ScraperWiki has won the Knight News Challenge!

The Knight Foundation are spending $280,000 over 2 years for us to improve ScraperWiki as a platform for journalists, and to run events to bring together journalists and programmers across the United States.

America has trailblazing organisations that do data and journalism well already – for example, both ProPublica and the Chicago Tribune have excellent data centers to support their news content. Our aim is to lower the barrier to entry into data driven journalism and to create (an order of magnitude) more of this type of success. So come join our campaign for America: Yes We Can (Scrape). PS: We are politically neutral but think open source when it comes to campaign strategy!

What are we going to do to the platform?

As well as polishing ScraperWiki to make it easier to use, and creating journalism focussed tutorials and screen casts, we’re adding four specific services for journalists:

Data embargo, so journalists can keep their stories secret until going to print, but publish the data in a structured, reusable, public form with the story.
Data on demand service. Often journalists need the right data ordered quickly, we’re going to create a smooth process so they can get that.
News application hosting. We’ll make it scalable and easier.
Data alerts. Automatically get leads from changing data. For example, watch bridge repair schedules, and email when one isn’t being maintained.

Here are two concrete examples of ScraperWiki being used already in similar ways:

We manually embargoed the MP data that James Ball demanded off us for his Guardian front page.
Edinburgh planning news application are embedded in the Greener Leith site, and now have alerts in the form of a Twitter bot.

Where in the US are we going to go?

What really matters about ScraperWiki is the people using it. Data is dead if it doesn’t have someone, a journalist or a citizen, analysing it, finding stories in it and making decisions from it.

We’re running Data Journalism Camps in each of a dozen states. These will be similar in format to our hacks and hackers hack days, which we’ve run across the UK and Ireland over the last year.

The camps will have two parts.

Making something. In teams of journalists and coders, using data to dig into a story, or make or prototype a news app, all in one day.
Scraping tutorials. For journalists who want to learn how to code, and programmers who want to know more about scraping and ScraperWiki.

This video of our event in Liverpool gives a flavour of what to expect.

Get in touch if you’d like us to stop near you, or are interested in helping or sponsoring the camps.

Finally…

The project is designed to be financially stable in the long term. While the public version of ScraperWiki will remain free, we will charge for extra services such as keeping data private, and data on demand. We’ll be working with B2B media, as well as consumer media.

As all Knight financed projects, the code behind ScraperWiki is open source, so newsrooms won’t be building a dependency on something they can’t control.

For more details you can read our original application (note that financial amounts have changed since then).

Finally, and most importantly, I’d like to congratulate and thank everyone who has worked on, used or supported ScraperWiki. The Knight News Challenge had 1,600 excellent applications, so this is a real validation of what we’re doing, both with data and with journalism.

Tags: alerts, data, embargo, hacks and hackers, journalism, knight, news applications

7 Responses to “Knight Foundation finance ScraperWiki for journalism”

M. Edward Borasky June 25, 2011 at 3:02 pm #

We have a pretty active hacker community in Portland, Oregon, and an active “Civic Apps” community associated with the City of Portland and to a lesser extent a regional government called “Metro”. I know of at least three of those hackers, myself among them, who have used ScraperWiki.

But nearly all of the journalists I know are dead-set against the kind of “coding” that’s required to acquire data via ScraperWiki. They have deadlines, and, let’s face it, HTML parsing, regular expressions and the mechanics of interrogating sites is hard programming. So the uptake of data journalism here in Portland has been slow, even with available hackers like myself. Right now I know of only one journalist here who’s attempted to use ScraperWiki (with the help of two hackers) and it’s been a frustrating experience.

What I think I’m looking for is some kind of “wireframing tool” – an interactive “drag and drop, point and click, WYSIWIG” user interface so that *journalists* can build scrapers. They have this now for the visualization / storytelling part of storytelling with tools like Tableau. They (or at least I) have this now for the data exploration part with tools like R, GGobi and Mondrian (and Excel, of course). But there’s not a tool I know of for rapidly building a scraper. Is there a chance you can use some of the grant money on a more journalist-friendly user interface?
- M. Edward Borasky June 25, 2011 at 3:05 pm #
  
  P.S.: I think Portland, Oregon would *love* a ScraperWiki camp – at least the hackers would.
Francis Irving July 10, 2011 at 11:28 pm #

Hi Edward – sorry for slow reply! I’ve been on holiday and only just saw this.

Our experience is that right now an attempt at a completely automated tool would always end up frustrating. There are quite a few products that try to do it, and they can be very useful – but they can’t magically scrape anything that is scrapable. You’ll end up programming, or doing programming like things, within them anyway (as happens in, say, Refine).

We are, however, doing several things with the grant money that are in the direction of an automatic tool. One of these is a set of tools that write the first pass of code for you – and in simple cases that will be all you need to do. (Others are journalist specific tutorials, and general usability improvements)

Would love to visit Portland!

Trackbacks/Pingbacks

LSDI : ScraperWiki, una ‘’ruspa’’ che scava nelle miniere di dati del web - June 29, 2011
[…] obbiettivo ora è realizzare una serie di nuovi servizi, fra cui, ad esempio, un sistema di embargo che consenta ai giornalisti di creare delle […]
Tow Center/ScraperWiki Datacamp | Tow Center for Digital Journalism - December 27, 2011
[…] & 4th, the Tow Center will be hosting a two-day “DataCamp” along with ScraperWiki, a recent Knight News Challenge winner and innovative platform for collecting and accessing online […]
$1 million to build a data platform | ScraperWiki Data Blog - February 9, 2012
[…] total, provided we hit certain milestones next August, and with the Knight Foundation money, this means we have a cool $1,000,000 […]
$1 million to build a data platform | ScraperWiki Data Blog - May 12, 2013
[…] total, provided we hit certain milestones next August, and with the Knight Foundation money, this means we have a cool $1,000,000 […]