I am finishing up my MSc Data Science placement at ScraperWiki and, by extension, my MSc Data Science (Computing Specialism) programme at Lancaster University. My project was to build a website to enable users to investigate the MOT data. This week the result of that work, the ScraperWiki MOT website, went live. The aim of […]
Book review: Data Science at the Command Line by Jeroen Janssens
In the mixed environment of ScraperWiki we make use of a broad variety of tools for data analysis. Data Science at the Command Line by Jeroen Janssens covers tools available at the Linux command line for doing data analysis tasks. The book is divided thematically into chapters on Obtaining, Scrubbing, Modeling, Interpreting Data with “intermezzo” […]
Book review: Data Science for Business by Provost and Fawcett
Marginalia are an insight into the mind of another reader. This struck me as a I read Data Science for Business by Foster Provost and Tom Fawcett. The copy of the book had previously been read by two of my colleagues. One of whom had clearly read the introductory and concluding chapters but not the […]
Scraping Spreadsheets with XYPath
Spreadsheets are great. They’re ubiquitously available, beaten only by the web pages and the word processor documents. Like the word processor, they’re easy to use and give the user a blank page, but they divide the page up into cells to make sure that the columns and rows all line up. And unlike more complicated […]
ScraperWiki – Professional Services
How would you go about collecting, structuring and analysing 100,000 reports on Romanian companies? You could use ScraperWiki to write and host you own computer code that carries out the scraping you need, and then use our other self-service tools to clean and analyse the data. But sometimes writing your own code is not a […]
Hi, I’m Paul
Hi! I’m the latest member of ScraperWiki, joining the Data Science team this week. Data Science is a fascinating new direction for me, being “officially” an Electronic Engineer. I’ve spent the last couple of years in a large company hammering out fast C++ and trying (unsuccessfully) to convert everyone to Python. But what really excites […]
From future import x.scraperwiki.com
Time flies when you’re building a platform. At the start of the year, we announced the beginnings of a new, more powerful, more flexible ScraperWiki. More powerful because it exposes industry standards like SQL, SSH, and a persistent filesystem to developers, so they can scrape and crunch and export data pretty much however they like. […]
The next evolution of ScraperWiki
Quietly, over the last few months, we’ve been rebuilding both the backend and the frontend of ScraperWiki. The new ScraperWiki has been built from the ground up to be more powerful for data scientists, and easier to use for everyone else. At its core, it’s about empowering people to take a hold of their data, […]