In the mixed environment of ScraperWiki we make use of a broad variety of tools for data analysis. Data Science at the Command Line by Jeroen Janssens covers tools available at the Linux command line for doing data analysis tasks. The book is divided thematically into chapters on Obtaining, Scrubbing, Modeling, Interpreting Data with “intermezzo” […]
Book review: Remote Pairing by Joe Kutner
Pair programming is an important part of the Agile process but sometimes the programmers are not physically co-located. At ScraperWiki we have staff who do both scheduled and ad hoc remote working therefore methods for working together remotely are important to us. A result of a casual comment on Twitter, I picked up “Remote Pairing” […]
Book review: Graph Databases by Ian Robinson, Jim Webber and Emil Eifrem
Regular readers will know I am on a bit of a graph binge at the moment. In computer science and mathematics graphs are collections of nodes joined by edges, they have all sorts of applications including the study of social networks and route finding. Having covered graph theory and visualisation, I now move on to […]
NewsReader: the developers story
ScraperWiki has been a partner in NewsReader, an EU Framework 7 research project, for the last couple of years. The aim of NewsReader is to give computers the power to “understand” the news; to extract from a myriad of news articles the underlying events which gave rise to those articles; the who, the where, the […]
Book review: Linked by Albert-László Barabási
I am on a bit of a graph theory binge, it started with an attempt to learn about Gephi, the graph visualisation software, which developed into reading a proper grown up book on graph theory. I then learnt a little more about practicalities on reading Seven Databases in Seven Weeks, which included a section on […]
Book review: Seven databases in Seven Weeks by Eric Redmond and Jim R. Wilson
I came to databases a little late in life, as a physical scientist I didn’t have much call for them. Then a few years ago I discovered the wonders of relational databases and the power of SQL. The ScraperWiki platform strongly encourages you to save data to SQLite databases to integrate with its tools. There […]
Exploring the ONS
The Office for National Statistics (ONS) is the United Kingdom statistical body charged by the government with the task of collecting and publishing statistics related to the economy, population and society of England and Wales at national, regional and local levels. The data is typically published in the form of Excel spreadsheets. The ONS is […]
Book review: Graph Theory and Complex Networks by Maarten van Steen
My last read, on the Gephi graph visualisation package, was a little disappointing but gave me an enthusiasm for Graph Theory. So I picked up one of the books that it recommended: Graph Theory and Complex Networks: An Introduction by Maarten van Steen to learn more. In this context a graph is a collection of […]
Book review: Network Graph Analysis and visualization with Gephi by Ken Cherven
I generally follow the rule that if I haven’t got anything nice to say about something then I shouldn’t say anything at all. Network Graph Analysis and visualization with Gephi by Ken Cherven challenges this principle. Gephi is a system for producing network visualisations, as such it doesn’t have a great many competitors. Fans of […]
Inordinately fond of beetles… reloaded!
Some time ago, in the era before I joined ScraperWiki I had a play with the Science Museums object catalogue. You can see my previous blog post here. It was at a time when I was relatively inexperienced with the Python programming language and had no access to Tableau, the visualisation software. It’s a piece […]