Glue Logic and Flowable Data

Guest post by Tony Hirst As well as being a great tool for scraping and aggregating content from third party sites, Scraperwiki can be used as a transformational “glue logic” tool: joining together applications that utilise otherwise incompatible data formats. Typically, we might think of using a scraper to pull data into one or more […]

Data Business Models

If it sometimes feels like the data business is full of buzzwords and hipster technical jargon, then that’s probably because it is. But don’t panic! I’ve been at loads of hip and non-hip data talks here and there and, buzzwords aside, I’ve come across four actual categories of data business model in this hip data […]

Hip Data Terms

“Big Data” and “Data Science” tend to be terms whose meaning is defined the moment they are used. They are sometimes meaningful, but their meaning is dependent on context. Through the agendas of many hip and not-so-hip data talks we could come up with some definitions some people mean, and will try and describe how […]

How do? I’m Zach.

So, a few years ago, I tended to spend my working time explaining emerging tech ideas (generally around Linked Data, Open Data, and APIs) for a UK-based Semantic Web company called Talis. I helped people tell stories, edited an industry magazine, blogged, podcasted and hosted events. Over time, I found the role evolving naturally into […]

Enterprise data analysis and visualization

The topic for today is a paper[1] by members of the Stanford Visualization Group on interviews with data analysts, entitled “Enterprise Data Analysis and Visualization: An Interview Study”. This is clearly relevant to us here at ScraperWiki, and thankfully their analysis fits in with the things we are trying to achieve. The study is compiled from interviews with 35 […]

So web scraping is easy?

Journalists, academics and budding open data hackers often praise ScraperWiki for making web scraping easy. And while it’s true our platform and powerful APIs let you get more done, more easily, the statement still creates some head-scratching at ScraperWiki HQ. That’s because, as far as we can tell, scraping is hard, no matter what platform […]

My time at the Autocloud

The global CADCAM behemoth known as Autodesk hoovers up another small company every two weeks — a process unlikely to diminish following a $750million bond issue last month. (Well, what else are they going to do with that money?) It was only a matter of time before this happened to me on account of my […]

A small matter of programming

We’re rebuilding ScraperWiki. For three years, we’ve been helping people get, clean and analyse data on the web. Our key insight was that you need to write code to do that, and we should make writing that code as easy as possible. Earlier this year, we realised that that isn’t enough. ScraperWiki Classic, as we […]

Scraping the Royal Society membership list

To a data scientist any data is fair game, from my interest in the history of science I came across the membership records of the Royal Society from 1660 to 2007 which are available as a single PDF file. I’ve scraped the membership list before: the first time around I wrote a C# application which […]

The next evolution of ScraperWiki

Quietly, over the last few months, we’ve been rebuilding both the backend and the frontend of ScraperWiki. The new ScraperWiki has been built from the ground up to be more powerful for data scientists, and easier to use for everyone else. At its core, it’s about empowering people to take a hold of their data, […]

ScraperWiki

Extract tables from PDFs and scrape the web

Blog