While building the Civil Service People Survey (CSPS) site, ScraperWiki had to deal with the complexities of suppressing data to avoid privacy leaks and making technology to process tens of millions of rows in a fraction of a second. We didn’t also have time to spend on basic web design. Luckily the Government’s Resources for designers, […]
Which car should I (not) buy? Find out, with the ScraperWiki MOT website…
I am finishing up my MSc Data Science placement at ScraperWiki and, by extension, my MSc Data Science (Computing Specialism) programme at Lancaster University. My project was to build a website to enable users to investigate the MOT data. This week the result of that work, the ScraperWiki MOT website, went live. The aim of […]
Book review: Docker Up & Running by Karl Matthias and Sean P. Kane
This last week I have been reading Docker Up & Running by Karl Matthias and Sean P. Kane, a newly published book on Docker – a container technology which is designed to simplify the process of application testing and deployment. Docker is a very new product, first announced in March 2013, although it is based […]
Book Review: Learning Spark by Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia
Apache Spark is a system for doing data analysis which can be run on a single machine or across a cluster, it is pretty new technology – initial work was in 2009 and Apache adopted it in 2013. There’s a lot of buzz around it, and I have a problem for which it might be […]
Scientists and Engineers… of What?
“All scientists are the same, no matter their field.” OK that sounds like a good ‘quotable’ quote, and since I didn’t see it said by anyone else, I can claim it as my own saying. The closest quote to this I saw was “No matter what engineering field you’re in, you learn the same basic […]
Technology Radar Report
Creating a sustainable technology company involves keeping up with technology. The thing about technology is that it changes, and we have to look to the future, and invest our time now in things that will be valuable in the future. Or, we could switch to doing SharePoint consultancy for the rest of our lives, but […]
Elasticsearch and elasticity: building a search for government documents
Based in Paris, the OECD is the Organisation for Economic Co-operation and Development. As the name suggests, the OECD’s job is to develop and promote new social and economic policies. One part of their work is researching how open countries trade. Their view is that fewer trade barriers benefit consumers, through lower prices, and companies, […]
Four specific things “agile” saved us from doing at ONS
There’s lots of both hype and cynicism around “agile”. Instead, look at this part of the original agile declaration. We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value: … Responding to change over Following a plan That is, while there […]
Book review: How Linux works by Brian Ward
A break since my last book review since I’ve been coding, rather than reading, on the commute into the ScraperWiki offices in Liverpool. Next up is How Linux Works by Brian Ward. In some senses this book follows on from Data Science at the Command Line by Jeroen Janssens. Data Science was about doing analysis […]
NewsReader – Hack 100,000 World Cup Articles
June 10, The Hub Westminster (@NewsReader) Ian Hopkinson has been telling you about our role in the NewsReader project. We’re making a thing that crunches large volumes of news articles. We’re combining natural language processing and semantic web technology. It’s an FP7 project so we’re working with a bunch of partners across Europe. We’re 18 […]