Hi! We've renamed ScraperWiki.
The product is now QuickCode and the company is The Sensible Code Company.

Archive | Case Studies

Case study: Enrique Cocero getting political data from PDFs

Political strategy is international now. Enrique Cocero works from Madrid for his consultancy 7-50 Electoral Math, using data to understand voters and candidates in election campaigns across the world. He’s struggled with PDFs for a long time, and recently found PDF Tables via a Google search. He says: I used to have nightmares – I’m […]

Hi, I’m Leonisha

My name is Leonisha Barley, and I am the latest addition to  the people fortunate enough to have an internship opportunity at Scraperwiki. I just finished my 2nd year at The University of Manchester studying for a BA(Hons) degree in Sociology and Criminology and I am exciting about developing my skills further this summer. There […]

Which plane had the most accidents?

Searching by facets Last year, ScraperWiki helped migrate lots of specialist datasets to GOV.UK. This afternoon, we happened to notice that the Air Accidents Investigation Branch reports, which we scraped from their old site, are live. The user interface is called Finder Frontend, and is used by GOV.UK wherever the user needs to search for […]

Elasticsearch and elasticity: building a search for government documents

Based in Paris, the OECD is the Organisation for Economic Co-operation and Development. As the name suggests, the OECD’s job is to develop and promote new social and economic policies. One part of their work is researching how open countries trade. Their view is that fewer trade barriers benefit consumers, through lower prices, and companies, […]

MOT

MOT Data Analysis: Progress Along the Fault-Pattern Finding Path

How do data science and data engineering differ? And where do they overlap? I agree to a large extent with the answer given here. A data scientist must be able to ask the right questions – ‘right’ in this context meaning interesting, providing intelligence that can lead to process improvement or greater profitability (you don’t […]

End User Programming at the Office for National Statistics

The Office for National Statistics (ONS) approached us regarding a task which involves transforming data in a spreadsheet. Basically, unpivotting it. Data transformation is quite a general problem, but one with recurring patterns. Marginal variables are usually, well, somewhere in the margin. Cells generally refer to an observation or the name or value of a […]

Pius Okoh

Hi, I’m Pius….

…and I’m the new thing at ScraperWiki. Yes you heard right, thing, not person or guy or anything human. Since I learnt that real-world entities could be modeled using programming language objects in order to answer questions or make inferences, one weird thing in my brain just interpreted it the other way – that real-world […]

Four specific things “agile” saved us from doing at ONS

There’s lots of both hype and cynicism around “agile”. Instead, look at this part of the original agile declaration. We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value: … Responding to change over Following a plan That is, while there […]

Adventures in Kaggle: Forest Cover Type Prediction

Regular readers of this blog will know I’ve read quite few machine learning books, now to put this learning into action. We’ve done some machine learning for clients but I thought it would be good to do something I could share. The Forest Cover Type Prediction challenge on Kaggle seemed to fit the bill. Kaggle […]

We're hiring!