My name is Leonisha Barley, and I am the latest addition to the people fortunate enough to have an internship opportunity at Scraperwiki. I just finished my 2nd year at The University of Manchester studying for a BA(Hons) degree in Sociology and Criminology and I am exciting about developing my skills further this summer. There […]
Book Review: Learning Spark by Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia
Apache Spark is a system for doing data analysis which can be run on a single machine or across a cluster, it is pretty new technology – initial work was in 2009 and Apache adopted it in 2013. There’s a lot of buzz around it, and I have a problem for which it might be […]
Which plane had the most accidents?
Searching by facets Last year, ScraperWiki helped migrate lots of specialist datasets to GOV.UK. This afternoon, we happened to notice that the Air Accidents Investigation Branch reports, which we scraped from their old site, are live. The user interface is called Finder Frontend, and is used by GOV.UK wherever the user needs to search for […]
PDFTables: All the tables in one page, CSV
Lots of you have asked for it, and we’ve finally changed the Excel download format at PDFTables.com to put all the pages of your PDF into one worksheet. This is particularly useful if you have big tables that span multiple pages. You can still have the old format, just choose “Excel (multiple sheets)” from the […]
Scientists and Engineers… of What?
“All scientists are the same, no matter their field.” OK that sounds like a good ‘quotable’ quote, and since I didn’t see it said by anyone else, I can claim it as my own saying. The closest quote to this I saw was “No matter what engineering field you’re in, you learn the same basic […]
Technology Radar Report
Creating a sustainable technology company involves keeping up with technology. The thing about technology is that it changes, and we have to look to the future, and invest our time now in things that will be valuable in the future. Or, we could switch to doing SharePoint consultancy for the rest of our lives, but […]
Elasticsearch and elasticity: building a search for government documents
Based in Paris, the OECD is the Organisation for Economic Co-operation and Development. As the name suggests, the OECD’s job is to develop and promote new social and economic policies. One part of their work is researching how open countries trade. Their view is that fewer trade barriers benefit consumers, through lower prices, and companies, […]
Book review: Mastering Gephi Network Visualisation by Ken Cherven
A little while ago I reviewed Ken Cherven’s book Network Graph Analysis and Visualisation with Gephi, it’s fair to say I was not very complementary about it. It was rather short, and had quite a lot of screenshots. It’s strength was in introducing every single element of the Gephi interface. This book, Mastering Gephi Network […]
MOT Data Analysis: Progress Along the Fault-Pattern Finding Path
How do data science and data engineering differ? And where do they overlap? I agree to a large extent with the answer given here. A data scientist must be able to ask the right questions – ‘right’ in this context meaning interesting, providing intelligence that can lead to process improvement or greater profitability (you don’t […]
End User Programming at the Office for National Statistics
The Office for National Statistics (ONS) approached us regarding a task which involves transforming data in a spreadsheet. Basically, unpivotting it. Data transformation is quite a general problem, but one with recurring patterns. Marginal variables are usually, well, somewhere in the margin. Cells generally refer to an observation or the name or value of a […]