journalism – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 International Data Journalism Awards….deadline fast approaching..(10th April 2012) https://blog.scraperwiki.com/2012/03/international-data-journalism-awards-deadline-fast-approaching-10th-april-2012/ Mon, 26 Mar 2012 17:00:29 +0000 http://blog.scraperwiki.com/?p=758216688 Everybody is talking and trying to do ‘data journalism’ and the first ever International Data Journalism Awards have been established to recognise the huge effort that people are making in this field.  It’s a great opportunity to showcase your work.  Backed by Google, the prizes are generous at €45,000 (over $55,000) to six winners and the process is being managed by Global Editors

The main objectives are to a) Contribute to setting high standards and highlighting the best practices in data journalism and b) Demonstrate the value of data journalism among editors and media executives.

There are three categories :-

  1. Data-driven investigative journalism
  2. Data visualisation & storytelling
  3. Data-driven applications

The competition is open to media companies, non-profit organisations, freelancers and individuals. Applicants are welcome to submit their best data journalism projects before 10 April 2012 at http://datajournalismawards.org/ submit-your-work/.

To find out more about the competition and how to apply check out  datajournalismawards.org.  If you have any questions about the competition get in touch with the lovely Liliana Bounegru, DJA Coordinator (bounegru [at] ejc [dot] net). Liliana works at the European Journalism Centre

]]>
758216688
Start Talking to Your Data – Literally! https://blog.scraperwiki.com/2011/09/start-talking-to-your-data-literally/ Fri, 23 Sep 2011 15:22:38 +0000 http://blog.scraperwiki.com/?p=758215454 Because ScraperWiki has a SQL database and an API with SQL extraction, I can SQL inject (haha!) straight into the API URL and use the JSON output.

So what does all that mean? I scraped the CSV files of Special Advisers’ meetings gifts and hospitalities at Number 10. This is being updated as the data is published because I can schedule the scraper to run. If it fails to run I get notified via email.

Now, I’ve written a script that publishes this information along with data from 4 other scrapers relating to Number 10 Downing Street, to a twitter account, Scrape_No10. Because I’ve made a twitter bot, I can tweet out a sentence and control the order and timing of tweets. I can even attach a hashtag which I can then rescrape to find what the social media sphere has attached to each data point. This has the potential to have the data go fish for you, as a journalist, but it is not immediately useful to the newsroom.

So I give you MoJoNewsBot! I have written a script as a module in an IRC chat bot. This queries my data via the ScraperWiki API and injects what I write into the SQL and extracts the answer from the resultant JSON file, giving me a written output into the chat room. For example:

Now I can write the commands in a private chat window with MoJoNewsBot or I can do it in the room. This means that rooms can be made for the political team in a newsroom or the environment team or the education team, and they can have their own bots with modules specific to their data streams. That way, computer assisted reporting can be collaborative and social. If you’re working on a story that has a political and an educational angle then you pop into both rooms. So both teams can see what you’re asking of the data. In that sense, you’ve got a social, data driven, virtual newsroom. As such, I’ve added other modules for the modern journalist.

With MoJoNewsBot you can look for twitter trends, search tweets, lookup last tweets, get the latest headlines from various news sources and check Google News. The bot has basic functions like Google search, Wolfram Alpha lookup, Wikipedia lookup, reminder setting and even a weather checker.

Here’s an example of the code needed to query the API and return a string from the JSON:

type = 'jsondict'
scraper = 'special_advisers_gifts_and_hospitality'
site = 'https://api.scraperwiki.com/api/1.0/datastore/sqlite?'
query = ('SELECT `Name of Special Adviser`, `Type of hospitality
received`, `Name of Organisation`, `Date of Hospitality`
FROM swdata WHERE `Name of Special Adviser` = "%s" ORDER BY
`Date of Hospitality` desc' % userinput)

params = { 'format': type, 'name': scraper, 'query': query}	

url = site + urllib.urlencode(params)

jsonurl = urllib2.urlopen(url).read()
swjson = json.loads(jsonurl)

for entry in swjson[:Number]:
    ans = ('On ' + entry["Date of Hospitality"] + ' %s'
          % userinput + ' got ' +
          entry["Type of hospitality received"] + ' from '
          + entry["Name of Organisation"])
    phenny.say(ans)

This is just a prototype and a proof of concept. I would add to the module so the query could cover a specific date range. After that, I could go back to ScraperWiki and write a scraper that pulls in the other 4 Number 10 scrapers and constructs the larger database. Then all I need to do is change the name of the scraper in my module to this new one and I can now query the much larger dataset that includes ministers and permanent secretaries!

Now that’s computer assisted reporting!

PS: have fixed the bug in .gn so the links match the headlines

]]>
758215454
Knight Foundation finance ScraperWiki for journalism https://blog.scraperwiki.com/2011/06/knight-foundation-finance-scraperwiki-for-journalism/ https://blog.scraperwiki.com/2011/06/knight-foundation-finance-scraperwiki-for-journalism/#comments Wed, 22 Jun 2011 19:22:25 +0000 http://blog.scraperwiki.com/?p=758215012 ScraperWiki is the place to work together on data, and it is particularly useful for journalism.

We are therefore very pleased to announce that ScraperWiki has won the Knight News Challenge!

The Knight Foundation are spending $280,000 over 2 years for us to improve ScraperWiki as a platform for journalists, and to run events to bring together journalists and programmers across the United States.

America has trailblazing organisations that do data and journalism well already – for example, both ProPublica and the Chicago Tribune have excellent data centers to support their news content. Our aim is to lower the barrier to entry into data driven journalism and to create (an order of magnitude) more of this type of success. So come join our campaign for America: Yes We Can (Scrape).  PS: We are politically neutral but think open source when it comes to campaign strategy!

What are we going to do to the platform?

As well as polishing ScraperWiki to make it easier to use, and creating journalism focussed tutorials and screen casts, we’re adding four specific services for journalists:

  • Data embargo, so journalists can keep their stories secret until going to print, but publish the data in a structured, reusable, public form with the story.
  • Data on demand service. Often journalists need the right data ordered quickly, we’re going to create a smooth process so they can get that.
  • News application hosting. We’ll make it scalable and easier.
  • Data alerts. Automatically get leads from changing data. For example, watch bridge repair schedules, and email when one isn’t being maintained.

Here are two concrete examples of ScraperWiki being used already in similar ways:

Where in the US are we going to go?

What really matters about ScraperWiki is the people using it. Data is dead if it doesn’t have someone, a journalist or a citizen, analysing it, finding stories in it and making decisions from it.

We’re running Data Journalism Camps in each of a dozen states. These will be similar in format to our hacks and hackers hack days, which we’ve run across the UK and Ireland over the last year.

The camps will have two parts.

  • Making something. In teams of journalists and coders, using data to dig into a story, or make or prototype a news app, all in one day.
  • Scraping tutorials. For journalists who want to learn how to code, and programmers who want to know more about scraping and ScraperWiki.

This video of our event in Liverpool gives a flavour of what to expect.

Get in touch if you’d like us to stop near you, or are interested in helping or sponsoring the camps.

Finally…

The project is designed to be financially stable in the long term. While the public version of ScraperWiki will remain free, we will charge for extra services such as keeping data private, and data on demand. We’ll be working with B2B media, as well as consumer media.

As all Knight financed projects, the code behind ScraperWiki is open source, so newsrooms won’t be building a dependency on something they can’t control.

For more details you can read our original application (note that financial amounts have changed since then).

Finally, and most importantly, I’d like to congratulate and thank everyone who has worked on, used or supported ScraperWiki. The Knight News Challenge had 1,600 excellent applications, so this is a real validation of what we’re doing, both with data and with journalism.

]]>
https://blog.scraperwiki.com/2011/06/knight-foundation-finance-scraperwiki-for-journalism/feed/ 7 758215012