Hi! We've renamed ScraperWiki.
The product is now QuickCode and the company is The Sensible Code Company.


Book review: Interactive Data Visualization for the web by Scott Murray

Book cover - interactive visualisation for the webNext in my book reading, I turn to Interactive Data Visualisation for the web by Scott Murray (@alignedleft on twitter). This book covers the d3 JavaScript library for data visualisation, written by Mike Bostock who was also responsible for the Protovis library.  If you’d like a taster of the book’s content, a number of the examples can also be found on the author’s website.

The book is largely aimed at web designers who are looking to include interactive data visualisations in their work. It includes some introductory material on JavaScript, HTML, and CSS, so has some value for programmers moving into web visualisation. I quite liked the repetition of this relatively basic material, and the conceptual introduction to the d3 library.

I found the book rather slow: on page 197 – approaching the final fifth of the book – we were still making a bar chart. A smaller effort was expended in that period on scatter graphs. As a data scientist, I expect to have several dozen plot types in that number of pages! This is something of which Scott warns us, though. d3 is a visualisation framework built for explanatory presentation (i.e. you know the story you want to tell) rather than being an exploratory tool (i.e. you want to find out about your data). To be clear: this “slowness” is not a fault of the book, rather a disjunction between the book and my expectations.

From a technical point of view, d3 works by binding data to elements in the DOM for a webpage. It’s possible to do this for any element type, but practically speaking only Scaleable Vector Graphics (SVG) elements make real sense. This restriction means that d3 will only work for more recent browsers. This may be a possible problem for those trapped in some corporate environments. The library contains a lot of helper functions for generating scales, loading up data, selecting and modifying elements, animation and so forth. d3 is low-level library; there is no PlotBarChart function.

Achieving the static effects demonstrated in this book using other tools such as R, Matlab, or Python would be a relatively straightforward task. The animations, transitions and interactivity would be more difficult to do. More widely, the d3 library supports the creation of hierarchical visualisations which I would struggle to create using other tools.

This book is quite a basic introduction, you can get a much better overview of what is possible with d3 by looking at the API documentation and the Gallery. Scott lists quite a few other resources including a wide range for the d3 library itself, systems built on d3, and alternatives for d3 if it were not the library you were looking for.

I can see myself using d3 in the future, perhaps not for building generic tools but for custom visualisations where the data is known and the aim is to best explain that data. Scott quotes Ben Schniederman on this regarding the structure of such visualisations:

overview first, zoom and filter, then details on demand

Tags: , ,

2 Responses to “Book review: Interactive Data Visualization for the web by Scott Murray”

  1. Julian Todd May 24, 2013 at 6:26 pm #

    You’re right to suggest that the best visualization tool is not a lot of good if you can’t program it. And programming is as much a process of exploring the data as it is writing the code to produce a particular visualization that — ideally — you haven’t seen before except in your imagination. You just hope it’s going to look good. And if it doesn’t, you need to explore/reprogram around the visualization until it does look good.

    If your exploratory tool, which you have interacted with semi-visually, is able to encode its state (say, in the query string), then you have effectively programmed this visualization.

    This is an easier and more exploratory way to program a visualization than by using R or MatLab. When making visualizations using those tools, one iterates the coding and drawing multiple times until it starts to produce an answer that is right. This is indeed a clunky means of interaction and exploration with the data visualization, but that is what it is. The popularity of something like R is that it makes good guesses at the visualization, and reduces the reprogramming iterations it takes. But there is still a trade-off.

    For example, your visualization in R might be very smart at placing the label on the Y-axis, while the interactive visualization in Javascript may not be so smart. However, if the Javascript visualization enables you to drag the label to the place you want to put it, then it’s going to take about the same amount of time to get it right. And it will be more flexible in other positionings, such as where to put the key for the graph — usually in the plotting area where it doesn’t overlay an important part of the curves. (ie this is a matter of taste).

    Now suppose you had a tool that could automatically render your Javascript visualization into a PNG bitmap, comparable to the static effect that is the output of R or Matlab. Maybe do this through an SVG->PNG converter.

    Then, couldn’t you say that creating visualizations the old fashioned hacky way by coding and rerunning to generate a static image is entirely redundant?

    That is if the platform can produce static bitmaps to be included in static reports? This is the bridge from one paradigm into the other.

  2. Ian Hopkinson May 26, 2013 at 11:56 am #

    Your comment is pretty much a blog post of its own!

    I see the power of R and Matlab being that they have a large range of pre-prepared visualisations which are low effort to apply to your data. Ultimately they are not great at the precise placement of labels and so forth, the serious visualisation people I know all take output from such programs and tidy up using Adobe Illustrator or Inkscape.

    Where R and Matlab fall down is they don’t support the “overview first, zoom and filter, then details on demand” visualisation methodology. I see this as the gap that d3 can fill, others have used Processing for this. Tableau is an attempt to make this a user-friendly experience but sacrifices the flexibility of programming environment.

    It would appear to be possible to render a d3 generated SVG visualisation to a bitmap on the server-side, so it would be possible to add this to the platform.

We're hiring!