ScraperWiki’s values and how to brainstorm yours

“You need to write down your values, they’re deeper than your vision or your mission” says nearly every cheesy article about how to set the direction of an organisation. We finally decided to have a go, nearly all employees together one afternoon tea time last week. It took about half an hour, and was great […]

Face ReKognition

I’ve previously written about social media and the popularity of our Twitter Search and Followers tools. But how can we make Twitter data more useful to our customers? Analysing the profile pictures of Twitter accounts seemed like an interesting thing to do since they are often the faces of the account holder and a face […]

Book review: Hadoop in Action by Chuck Lam

Hadoop in Action by Chuck Lam provides a brief, fairly technical introduction to the Hadoop Big Data ecosystem. Hadoop is an open source implementation of the MapReduce framework originally developed by Google to process huge quantities of web search data. The name MapReduce, refers to dividing up jobs amongst multiple processors (“Mapping”) and then recombining […]

New ScraperWiki tool lets you extract data from reports with complete accuracy

It’s not always possible to automate data gathering, even with scrapers. Often we find customers want to regularly update data in ScraperWiki via spreadsheets. Either they’ve made the spreadsheets via a report from another system (typically one that isn’t on the web), or they gather the data by hand (for example, by phoning someone up […]

Book review: Python for Data Analysis by Wes McKinney

As well as developing scrapers and a data platform, at ScraperWiki we also do data analysis. Some of this is just because we’re interested, other times it’s because clients don’t have the tools or the time to do the analysis they want themselves. Often the problem is with the size of the data. Excel is […]

The best data opens itself on UK Gov’s Performance Platform

This is third in a series of posts about the UK Government’s Performance Platform, cross-posted on the OKFN blog as it is about open data. Part 1 introduced why the platform is exciting, and part 2 described how it worked inside. The best data opens itself. No need to make Freedom of Information requests to pry the information […]

Book review: Mining the Social Web by Matthew A. Russell

The twitter search and follower tools are amongst the most popular on the ScraperWiki platform so we are looking to provide more value in this area. To this end I’ve been reading “Mining the Social Web” by Matthew A. Russell. In the first instance the book looks like a run through the APIs for various […]

Book review: Data Mining – Practical Machine Learning Tools and Techniques by Witten, Frank and Hall

I’ve been doing more reading on machine learning, this time in the form of Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten, Eibe Frank and Mark A. Hall. This comes by recommendation of my academic colleagues on the Newsreader project, who rely heavily on machine learning techniques to do natural language […]

The BIG Lottery Data

The UK’s BIG Lottery Fund recently released its grant data since 2004 as a set of lovely CSV files: You can get it yourself here or here. I found it a great opportunity to try out some new tricks with Tableau, and have a bit of a poke around another largish dataset from government. The […]

ScraperWiki

Extract tables from PDFs and scrape the web

Blog