Summary – Big Data Value Association June Summit (Madrid)

In late June, 375 Europeans + 1 attended the Big Data Value Association (BVDA) Summit in Madrid. The BVDA is the private part of the Big Data Public Private Partnership. The Public part is the European Commission. The delivery mechanism is Horizon 2020 and €500m funding . The PPP commenced in 2015 and runs to […]

Book Review: Learning Spark by Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia

Apache Spark is a system for doing data analysis which can be run on a single machine or across a cluster, it is pretty new technology – initial work was in 2009 and Apache adopted it in 2013. There’s a lot of buzz around it, and I have a problem for which it might be […]

Book review: Big data by Viktor Mayer-Schönberger and Kenneth Cukier

We hear a lot about “Big Data” at ScraperWiki. We’ve always been a bit bemused by the tag since it seems to be used indescriminately. Just what is big data and is there something special I should do with it? Is it even a uniform thing? I’m giving a workshop on data science next week and […]

Hip Data Terms

“Big Data” and “Data Science” tend to be terms whose meaning is defined the moment they are used. They are sometimes meaningful, but their meaning is dependent on context. Through the agendas of many hip and not-so-hip data talks we could come up with some definitions some people mean, and will try and describe how […]

International Data Journalism Awards….deadline fast approaching..(10th April 2012)

Everybody is talking and trying to do ‘data journalism’ and the first ever International Data Journalism Awards have been established to recognise the huge effort that people are making in this field. It’s a great opportunity to showcase your work. Backed by Google, the prizes are generous at €45,000 (over $55,000) to six winners and […]

Happy New Year and Happy New York!

We are really pleased to announce that we will be hosting our very first US two day Journalism Data Camp event in conjunction with the Tow Center for Digital Journalism at Columbia University and supported by the Knight Foundation on February 3rd and 4th 2012. We have been working with Emily Bell @emilybell, Director of […]

‘Big Data’ in the Big Apple

My colleague @frabcus captured the main theme of Strata New York #strataconf in his most recent blog post. This was our first official speaking engagement in the USA as a Knight News Challenge 2011 winner. Here is my twopence worth! At first we were a little confused at the way in which the week long […]

Four data trends to rule them all, the data scientist king to bind them

My favourite soundbite from O’Reilly’s Strata data conference was a definition of big data. John Rauser, Amazon’s main data scientist, said to me that “data is big data when you can’t process it on one machine”. And naturally, small data is data that you can process on one machine. What’s nice about this definition is it […]

ScraperWiki

Extract tables from PDFs and scrape the web

Tag Archives | big data