graphy theory – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 Book review: Graph Theory and Complex Networks by Maarten van Steen https://blog.scraperwiki.com/2014/10/book-review-graph-theory-and-complex-networks-by-maarten-van-steen/ Mon, 20 Oct 2014 08:28:53 +0000 https://blog.scraperwiki.com/?p=758222399 graph_theoryMy last read, on the Gephi graph visualisation package, was a little disappointing but gave me an enthusiasm for Graph Theory. So I picked up one of the books that it recommended: Graph Theory and Complex Networks: An Introduction by Maarten van Steen to learn more. In this context a graph is a collection of vertices connected by edges, the edges may be directed or undirected. The road network is an example of a graph; the junctions between roads are vertices, the edges are roads and a one way street is a directed edge – two-way streets are undirected.

Why study graph theory?

Graph theory underpins a bunch of things like route finding, timetabling, map colouring, communications routing, sol-gel transitions, ecologies, parsing mathematical expressions and so forth. It’s been a staple of Computer Science undergraduate courses for a while, and more recently there’s been something of a resurgence in the field with systems on the web provided huge quantities of graph-shaped data both in terms of the underlying hardware networks and the activities of people – the social networks.

Sometimes the links between graph theory and an application are not so obvious. For example, project planning can be understood in terms of graph theory. A task can depend on another task – the tasks being two vertices in a graph. The edge between such vertices is directed, from one to the other, indicating dependency. To give a trivial example: you need a chicken to lay an egg. As a whole a graph of tasks cannot contain loops (or cycles) since this would imply that a task depended on a task that could only be completed after it, itself had been completed. To return to my example: if you need an egg in order to get a chicken to lay an egg then you’re in trouble! Generally, networks of tasks should be directed acyclic graphs (or DAG) i.e. they should not contain cycles.

The book’s target audience is 1st or 2nd year undergraduates with moderate background in mathematics, it was developed for Computer Science undergraduates. The style is quite mathematical but fairly friendly. The author’s intention is to introduce the undergraduate to mathematical formalism. I found this useful, since mathematical symbols are difficult to search for and shorthands such  as operator overloading even more so. This said, it is still an undergraduate text rather than a popular accounts don’t expect an easy read or pretty pictures, or even pretty illustrations.

The book divides into three chunks. The first provides the basic language for describing graphs, both words and equations. The second part covers theorems arising from some of the basic definitions, including the ideas of “walks” – traversals of a graph which take in all vertices and “tours” which take in all edges. This includes long standing problems such as the Dijkstra’s algorithm for route finding, and the travelling salesman problem. Also included in this section are “trees” – networks with no cycles – where is a cycle is a closed walk which visits vertices just once.

The third section covers the analysis of graphs. This starts with metrics for measuring graphs such as vertex degree distributions, distance statistics and clustering measures. I found this section rather brief, and poorly illustrated. However, it is followed by an introduction to various classes of complex networks including the original random graphs(connect), small-world and scale-free networks. What is stuck me about complex graphs is that they are each complex in their own way. Random, small-world and scale-free networks are all methods for constructing a network in order to try to represent a known real world situation. Small-world networks arise from one of Stanley Milgram’s experiments: sending post across the US via social networks. The key feature is that there are clusters of people who know each other but these clusters are linked by the odd “longer range” contact.

The book finishes with some real world examples relating to the world wide web, peer-to-peer sharing algorithms and social networks. What struck me in social networks is that the vertices (people!) you identify as important can depend quite sensitively on the metric you use to measure importance.

I picked up Graph Theory after I’d been working with Gephi, wanting to learn more about the things that Gephi will measure for me. It serves that purpose pretty well. In addition I have a better feel for situations where the answer is “graph theory”. Furthermore, Gephi has a bunch of network generators to create random, small-world and scale-free networks so that you can try out what you’ve learned.

]]>
758222399