Hi! We've renamed ScraperWiki.
The product is now QuickCode and the company is The Sensible Code Company.

Blog

Book review: Mastering Gephi Network Visualisation by Ken Cherven

1994_7344OS_Mastering Gephi Network VisualizationA little while ago I reviewed Ken Cherven’s book Network Graph Analysis and Visualisation with Gephi, it’s fair to say I was not very complementary about it. It was rather short, and had quite a lot of screenshots. It’s strength was in introducing every single element of the Gephi interface. This book, Mastering Gephi Network Visualisation by Ken Cherven is a different, and better, book.

Networks in this context are collections of nodes connected by edges, networks are ubiquitous. The nodes may be people in a social network, and the edges their friendships. Or the nodes might be proteins and metabolic products and the edges the reaction pathways between them. Or any other of a multitude of systems. I’ve reviewed a couple of other books in this area including Barabási’s popular account of the pervasiveness of networks, Linked, and van Steen’s undergraduate textbook, Graph Theory and Complex Networks, which cover the maths of network (or graph) theory in some detail.

Mastering Gephi is a practical guide to using the Gephi Network visualisation software, it covers the more theoretical material regarding networks in a peripheral fashion. Gephi is the most popular open source network visualisation system of which I’m aware, it is well-featured and under active development. Many of the network visualisations you see of, for example, twitter social networks, will have been generated using Gephi. It is a pretty complex piece of software, and if you don’t want to rely on information on the web, or taught courses then Cherven’s books are pretty much your only alternative.

The core chapters are on layouts, filters, statistics, segmenting and partitioning, and dynamic networks. Outside this there are some more general chapters, including one on exporting visualisations and an odd one on “network patterns” which introduced diffusion and contagion in networks but then didn’t go much further.

I found the layouts chapter particularly useful, it’s a review of the various layout algorithms available. In most cases there is no “correct” way of drawing a network on a 2D canvas, layout algorithms are designed to distribute nodes and edges on a canvas to enable the viewer to gain understanding of the network they represent.  From this chapter I discovered the directed acyclic graph (DAG) layout which can be downloaded as a Gephi plugin. Tip: I had to go search this plugin out manually in the Gephi Marketplace, it didn’t get installed when I indiscriminately tried to install all plugins. The DAG layout is good for showing tree structures such as organisational diagrams.

I learnt of the “Chinese Whispers” and “Markov clustering” algorithms for identifying clusters within a network in the chapter on segmenting and partitioning. These algorithms are not covered in detail but sufficient information is provided that you can try them out on a network of your choice, and go look up more information on their implementation if desired. The filtering chapter is very much about the mechanics of how to do a thing in Gephi (filter a network to show a subset of nodes), whilst the statistics chapter is more about the range of network statistical measures known in the literature.

I was aware of the ability of Gephi to show dynamic networks, ones that evolved over time, but had never experimented with this functionality. Cherven’s book provides an overview of this functionality using data from baseball as an example. The example datasets are quite appealing, they include social networks in schools, baseball, and jazz musicians. I suspect they are standard examples in the network literature, but this is no bad thing.

The book follows the advice that my old PhD supervisor gave me on giving presentations: tell the audience what you are go to tell them, tell them and then tell them what you told them. This works well for the limited time available in a spoken presentation, repetition helps the audience remember, but it feels a bit like overkill in a book. In a book we can flick back to remind us what was written earlier.

It’s a bit frustrating that the book is printed in black and white, particularly at the point where we are asked to admire the blue and yellow parts of a network visualisation! The referencing is a little erratic with a list of books appearing in the bibliography but references to some of the detail of algorithms only found in the text.

I’m happy to recommend this book as a solid overview of Gephi for those that prefer to learn from dead tree, such as myself. It has good coverage of Gephi features, and some interesting examples. In places it is a little shallow and repetitive.

The publisher sent me this book, free of charge, for review.

Tags: , ,

We're hiring!