Hi! We've renamed ScraperWiki.
The product is now QuickCode and the company is The Sensible Code Company.

Blog

Face ReKognition

I’ve previously written about social media and the popularity of our Twitter Search and Followers tools. But how can we make Twitter data more useful to our customers? Analysing the profile pictures of Twitter accounts seemed like an interesting thing to do since they are often the faces of the account holder and a face […]

Hadoop in Action book cover

Book review: Hadoop in Action by Chuck Lam

Hadoop in Action by Chuck Lam provides a brief, fairly technical introduction to the Hadoop Big Data ecosystem. Hadoop is an open source implementation of the MapReduce framework originally developed by Google to process huge quantities of web search data. The name MapReduce, refers to dividing up jobs amongst multiple processors (“Mapping”) and then recombining […]

New ScraperWiki tool lets you extract data from reports with complete accuracy

It’s not always possible to automate data gathering, even with scrapers. Often we find customers want to regularly update data in ScraperWiki via spreadsheets. Either they’ve made the spreadsheets via a report from another system (typically one that isn’t on the web), or they gather the data by hand (for example, by phoning someone up […]

Book review: Python for Data Analysis by Wes McKinney

As well as developing scrapers and a data platform, at ScraperWiki we also do data analysis. Some of this is just because we’re interested, other times it’s because clients don’t have the tools or the time to do the analysis they want themselves. Often the problem is with the size of the data. Excel is […]

socail media collage

Getting sociable

The Search for Tweets and Get Twitter followers tools are the most popular on our platform. Why is this? In part this is because we’re sociable creatures; platforms like Twitter get a lot of interaction time from a lot of people. A certain section of the population has a data packrat mentality. For them ScraperWiki […]

The best data opens itself on UK Gov’s Performance Platform

This is third in a series of posts about the UK Government’s Performance Platform, cross-posted on the OKFN blog as it is about open data. Part 1 introduced why the platform is exciting, and part 2 described how it worked inside. The best data opens itself. No need to make Freedom of Information requests to pry the information […]

Data Mining Cover

Book review: Data Mining – Practical Machine Learning Tools and Techniques by Witten, Frank and Hall

I’ve been doing more reading on machine learning, this time in the form of Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten, Eibe Frank and Mark A. Hall. This comes by recommendation of my academic colleagues on the Newsreader project, who rely heavily on machine learning techniques to do natural language […]

BIG Lottery Fund Logo

The BIG Lottery Data

The UK’s BIG Lottery Fund recently released its grant data since 2004 as a set of lovely CSV files: You can get it yourself here or here. I found it a great opportunity to try out some new tricks with Tableau, and have a bit of a poke around another largish dataset from government. The […]

We're hiring!