My last read, on the Gephi graph visualisation package, was a little disappointing but gave me an enthusiasm for Graph Theory. So I picked up one of the books that it recommended: Graph Theory and Complex Networks: An Introduction by Maarten van Steen to learn more. In this context a graph is a collection of […]
Book review: Network Graph Analysis and visualization with Gephi by Ken Cherven
I generally follow the rule that if I haven’t got anything nice to say about something then I shouldn’t say anything at all. Network Graph Analysis and visualization with Gephi by Ken Cherven challenges this principle. Gephi is a system for producing network visualisations, as such it doesn’t have a great many competitors. Fans of […]
Book review: Big data by Viktor Mayer-Schönberger and Kenneth Cukier
We hear a lot about “Big Data” at ScraperWiki. We’ve always been a bit bemused by the tag since it seems to be used indescriminately. Just what is big data and is there something special I should do with it? Is it even a uniform thing? I’m giving a workshop on data science next week and […]
Book review: Learning SPARQL by Bob DuCharme
The NewsReader project on which we are working at ScraperWiki uses semantic web technology and natural language processing to derive meaning from the news. We are building a simple API to give access to the NewsReader datastore, whose native interface is SPARQL. SPARQL is a SQL-like query language used to access data stored in the […]
Book review: Data Science for Business by Provost and Fawcett
Marginalia are an insight into the mind of another reader. This struck me as a I read Data Science for Business by Foster Provost and Tom Fawcett. The copy of the book had previously been read by two of my colleagues. One of whom had clearly read the introductory and concluding chapters but not the […]
Book review: The Signal and the Noise by Nate Silver
Nate Silver first came to my attention during the 2008 presidential election in the US. He correctly predicted the outcome of the November results in 49 of 50 states, missing only on Indiana where Barack Obama won by just a single percentage point. This is part of a wider career in prediction: aside from a […]
Book review: Hadoop in Action by Chuck Lam
Hadoop in Action by Chuck Lam provides a brief, fairly technical introduction to the Hadoop Big Data ecosystem. Hadoop is an open source implementation of the MapReduce framework originally developed by Google to process huge quantities of web search data. The name MapReduce, refers to dividing up jobs amongst multiple processors (“Mapping”) and then recombining […]
Book review: Python for Data Analysis by Wes McKinney
As well as developing scrapers and a data platform, at ScraperWiki we also do data analysis. Some of this is just because we’re interested, other times it’s because clients don’t have the tools or the time to do the analysis they want themselves. Often the problem is with the size of the data. Excel is […]
Book review: Mining the Social Web by Matthew A. Russell
The twitter search and follower tools are amongst the most popular on the ScraperWiki platform so we are looking to provide more value in this area. To this end I’ve been reading “Mining the Social Web” by Matthew A. Russell. In the first instance the book looks like a run through the APIs for various […]
Book review: Data Mining – Practical Machine Learning Tools and Techniques by Witten, Frank and Hall
I’ve been doing more reading on machine learning, this time in the form of Data Mining: Practical Machine Learning Tools and Techniques by Ian H. Witten, Eibe Frank and Mark A. Hall. This comes by recommendation of my academic colleagues on the Newsreader project, who rely heavily on machine learning techniques to do natural language […]