james ball – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 Read all about it read all about it: “ScraperWiki gets on the Guardian front page…” https://blog.scraperwiki.com/2011/02/read-all-about-it-read-all-about-it-scraperwiki-gets-on-the-guardian-front-page/ https://blog.scraperwiki.com/2011/02/read-all-about-it-read-all-about-it-scraperwiki-gets-on-the-guardian-front-page/#comments Fri, 25 Feb 2011 15:06:51 +0000 http://blog.scraperwiki.com/?p=758214338 A data driven story by investigative journalist James Ball on lobbyist influence in the UK Parliament has made it on to the front page of the Guardian. What is exciting for us is that James Ball’s story is helped and supported by a ScraperWiki script that took data from registers across parliament that is located on different servers and aggregates them into one source table that can be viewed in a spreadsheet or document.  This is now a living source of data that can be automatically updated.  http://scraperwiki.com/scrapers/all_party_groups/

For the past year the team at ScraperWiki has been running media events around the country. Our next one is in Cardiff and fully subscribed; we also have an event at BBC Scotland in Glasgow on 25 March.   Throughout the programme we have had the opportunity to meet great journalists and bloggers from national and local press so we always thought we would make it to the front page –  we just didn’t know when or by whom.

The story demonstrates the potential power of ScraperWiki to help journalists and researchers join the dots efficiently by collaboratively working with data specialists and software systems. Journalists can put down markers that run and update automatically and they can monitor the data over time with the objective of holding ‘power and money’ to account. The added value  of this technique is that in one step the data is represented in a uniform structure and linked to the source thus ensuring its provenance.  The software code that collects the data can be inspected by others in a peer review process to ensure the fidelity of the data.

In addition and because of the collaborative and social nature of the platform there is also the potential to involve others in the wider technical and data community to continue to improve the data.  Since the data is delivered using a scheduled script that runs daily  – journalists and interested parties can now subscribe to the data set for future changes and amendments.  So, for example, a journalist interested in any influence by a company, such as Virgin, can now have a specific email alert for donations or other actions by the conglomerate.

We know and understand that data in the media sector needs to be kept embargoed until the story breaks.  Next month we will be launching an opportunity for data consumers to request and subscribe to specific data feeds.

There is a tsunami of data being published and its increasingly hard for investigative journalists to find the time to sift through the masses of information and to make sense of it.  We believe that ScraperWiki helps to solve some of the ‘hard’ data issues that people in the media face on a daily basis.

Congratulations to James on his front page story and to the fantastic team at the Guardian who do fabulous work on open data and data driven journalism – long may it continue!

]]>
https://blog.scraperwiki.com/2011/02/read-all-about-it-read-all-about-it-scraperwiki-gets-on-the-guardian-front-page/feed/ 3 758214338
Hacks/Hackers London meetup to discuss Iraq War logs https://blog.scraperwiki.com/2010/10/hackshackerslondon/ Tue, 26 Oct 2010 10:02:50 +0000 http://blog.scraperwiki.com/?p=758213968 Scraperwiki will be supporting the November Hacks/Hackers London meetup at 7pm on Wednesday 24th November 2010 at The Irish Club, 2-4 Tudor Street, EC4Y 0AA, London. A few tickets are still available, but places are filling fast.

Schedule

  • 7.00pm: The data journalism behind the Iraq War Logs James Ball, Bureau of Investigative Journalism

James, Development Producer for the Bureau of Investigative Journalism and Chief Data Analyst on the TBIJ/Channel 4 Dispatches investigation into the Iraq War Logs, will explain how data journalism powered the process.

  • 7.30pm: TBC
  • 8pm: Social!
]]>
758213968