Book cover - interactive visualisation for the web

Book review: Interactive Data Visualization for the web by Scott Murray

Next in my book reading, I turn to Interactive Data Visualisation for the web by Scott Murray (@alignedleft on twitter). This book covers the d3 JavaScript library for data visualisation, written by Mike Bostock who was also responsible for the Protovis library. If you’d like a taster of the book’s content, a number of the examples […]

Book review: JavaScript: The Good Parts by Douglas Crockford

This week I’ve been programming in JavaScript, something of a novelty for me. Jealous of the Dear Leader’s automatically summarize tool I wanted to make something myself, hopefully a future post will describe my timeline visualising tool. Further motivations are that web scraping requires some knowledge of JavaScript since it is a key browser technology […]

Book Review: Machine Learning in Action by Peter Harrington

Machine learning is about prediction, and prediction is a valuable commodity. This sounds pretty cool and definitely the sort of thing a data scientist should be into, so I picked up Machine Learning in Action by Peter Harrington to get an overview of the area. Amongst the examples covered in this book are: Given that […]

Book Review: Data Visualization: a successful design process by Andy Kirk

My next review is of Andy Kirk’s book Data Visualization: a successful design process. Those of you on Twitter might know him as @visualisingdata, where you can follow his progress around the world as he delivers training. He also blogs at Visualising Data. Previously in this area, I’ve read Tufte’s book The Visual Display of […]

Book Review: R in Action by Robert I. Kabacoff

This is a review of Robert I. Kabacoff’s book R in Action which is a guided tour around the statistical computing package, R. My reasons for reading this book were two-fold: firstly, I’m interested in using R for statistical analysis and visualisation. Previously I’ve used Matlab for this type of work, but R is growing in […]

Tools of the trade

With the experience of a whole week of ScraperWiki, I am starting to appreciate the core tools of the professional Data Scientist. In the past I’ve written scrapers in Matlab, C# and Python. However, the house language for scraping at ScraperWiki is Python. It’s a good choice: a mature but modern language with a wide […]

I am Ian, Ian I am*

I have an 8 year itch: I spent the first 8 years of my career as an academic ending up a lecturer in physics at UMIST. Then I was a research scientist at a large “fast moving consumer goods” company for another 8 years. On Monday I started work at ScraperWiki as Senior Data Scientist, […]

Enterprise data analysis and visualization

The topic for today is a paper[1] by members of the Stanford Visualization Group on interviews with data analysts, entitled “Enterprise Data Analysis and Visualization: An Interview Study”. This is clearly relevant to us here at ScraperWiki, and thankfully their analysis fits in with the things we are trying to achieve. The study is compiled from interviews with 35 […]

Scraping the Royal Society membership list

To a data scientist any data is fair game, from my interest in the history of science I came across the membership records of the Royal Society from 1660 to 2007 which are available as a single PDF file. I’ve scraped the membership list before: the first time around I wrote a C# application which […]

ScraperWiki

Extract tables from PDFs and scrape the web

Archive by Author