knight – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 With Tools, Tables and Tours: We’re Looking to Liberate Data Across the US https://blog.scraperwiki.com/2011/11/with-tools-tables-and-tours-were-looking-to-liberate-data-across-the-us/ Mon, 14 Nov 2011 17:16:30 +0000 http://blog.scraperwiki.com/?p=758215825 As part of the Knight News Challenge entry, we at ScraperWiki said we would roll out Journalism Data Camps across the U.S. We had done what we called “Hacks and Hackers Hack Day” events across the U.K. and Ireland, bringing journalists and coders together. This happened at the same time as HacksHackers in the U.S. — great minds and whatnot!

Now we’re scaling up when it comes to exploring the data prospects of the new world. We are heading across the U.S. on a data liberation front. But where do we start, and where do we go? Well, firstly we want to liberate data. And lots and lots of people can use data. More importantly, we want to bring together anyone who wants to work with data to tell a story, provide insight or build an application.

So how do you go about finding where the right mixes are? Well, I scraped the data, mapped it, and visualized it, of course! I scraped media organizationsRRubyPython, and PHP meetup groups, data conferences, some B2B media as well as HacksHackers Chapters, and the top journalism schools. All in all, almost 13,000 data points were collected from different scrapers. So I put them into Google Fusion Tables and voila! (Please click on the image to be taken to the map)

A heat map gives me the hotspots for the concentrations of data points. These are biased towards the media sector, as there are many more outlets than interest groups and journalism schools. But it’s a good gauge of where we can build interest for the events.

Drilling down through the data using filter and aggregate, I got the breakdown of the proportion of the groups we want to reach for each locale. With some rough and ready image manipulation (I use Gimp as it’s open source), I mashed up a visualization scaling the pie charts so that the pixel radius corresponds to the size of the dataset for that location.

Now, it’s not an exact science nor is it news site-ready. But the speed in which I can look for a guide from data is now set to the digital time clock. 13,000 data points collected, cleaned and visualized in half a day. This is now a loose guide but also a tool. And this is the sort of quick thinking, quick gathering and quick analyzing we want to see at our events. So think big data. Think multiple sources. Think multiple tools. And then you can extrapolate for multiple uses!

We haven’t settled on our tour locations yet, so watch this space for details. We’re also getting clues for where to go from the data underground, so don’t think the data is giving everything away. We hope to see you there!

]]>
758215825
Knight Foundation finance ScraperWiki for journalism https://blog.scraperwiki.com/2011/06/knight-foundation-finance-scraperwiki-for-journalism/ https://blog.scraperwiki.com/2011/06/knight-foundation-finance-scraperwiki-for-journalism/#comments Wed, 22 Jun 2011 19:22:25 +0000 http://blog.scraperwiki.com/?p=758215012 ScraperWiki is the place to work together on data, and it is particularly useful for journalism.

We are therefore very pleased to announce that ScraperWiki has won the Knight News Challenge!

The Knight Foundation are spending $280,000 over 2 years for us to improve ScraperWiki as a platform for journalists, and to run events to bring together journalists and programmers across the United States.

America has trailblazing organisations that do data and journalism well already – for example, both ProPublica and the Chicago Tribune have excellent data centers to support their news content. Our aim is to lower the barrier to entry into data driven journalism and to create (an order of magnitude) more of this type of success. So come join our campaign for America: Yes We Can (Scrape).  PS: We are politically neutral but think open source when it comes to campaign strategy!

What are we going to do to the platform?

As well as polishing ScraperWiki to make it easier to use, and creating journalism focussed tutorials and screen casts, we’re adding four specific services for journalists:

  • Data embargo, so journalists can keep their stories secret until going to print, but publish the data in a structured, reusable, public form with the story.
  • Data on demand service. Often journalists need the right data ordered quickly, we’re going to create a smooth process so they can get that.
  • News application hosting. We’ll make it scalable and easier.
  • Data alerts. Automatically get leads from changing data. For example, watch bridge repair schedules, and email when one isn’t being maintained.

Here are two concrete examples of ScraperWiki being used already in similar ways:

Where in the US are we going to go?

What really matters about ScraperWiki is the people using it. Data is dead if it doesn’t have someone, a journalist or a citizen, analysing it, finding stories in it and making decisions from it.

We’re running Data Journalism Camps in each of a dozen states. These will be similar in format to our hacks and hackers hack days, which we’ve run across the UK and Ireland over the last year.

The camps will have two parts.

  • Making something. In teams of journalists and coders, using data to dig into a story, or make or prototype a news app, all in one day.
  • Scraping tutorials. For journalists who want to learn how to code, and programmers who want to know more about scraping and ScraperWiki.

This video of our event in Liverpool gives a flavour of what to expect.

Get in touch if you’d like us to stop near you, or are interested in helping or sponsoring the camps.

Finally…

The project is designed to be financially stable in the long term. While the public version of ScraperWiki will remain free, we will charge for extra services such as keeping data private, and data on demand. We’ll be working with B2B media, as well as consumer media.

As all Knight financed projects, the code behind ScraperWiki is open source, so newsrooms won’t be building a dependency on something they can’t control.

For more details you can read our original application (note that financial amounts have changed since then).

Finally, and most importantly, I’d like to congratulate and thank everyone who has worked on, used or supported ScraperWiki. The Knight News Challenge had 1,600 excellent applications, so this is a real validation of what we’re doing, both with data and with journalism.

]]>
https://blog.scraperwiki.com/2011/06/knight-foundation-finance-scraperwiki-for-journalism/feed/ 7 758215012