Scheduling! Keep your data fresh

We’ve added scheduling to the “Code in your browser” tool on beta.scraperwiki.com. For now it is daily, as that covers most people’s uses. Please ask if you need something else! Or have a look at the tool’s source code. Want to know how to use the new ScraperWiki? There’s a quick start guide to coding […]

Free community accounts on the ScraperWiki Beta

We’ve been teasing and tempting you with blog posts about the first few tools on the new ScraperWiki Beta for a while now. It’s time to let you try them out first-hand. As of right now, the new ScraperWiki Beta is open for you, your aunt, anyone, to sign up for a free community account: […]

Book cover - interactive visualisation for the web

Book review: Interactive Data Visualization for the web by Scott Murray

Next in my book reading, I turn to Interactive Data Visualisation for the web by Scott Murray (@alignedleft on twitter). This book covers the d3 JavaScript library for data visualisation, written by Mike Bostock who was also responsible for the Protovis library. If you’d like a taster of the book’s content, a number of the examples […]

Summarise #3: Buckets of time and numbers

In the last two weeks I introduced the “Summarise automatically tool”, which magically shows you interesting facts about any dataset in the new ScraperWiki. It’s an open source tool – geeks can play along on Github, or use the SSH button to log into the tool and see the code running in action. After adding […]

Summarising Serendipity

5 years ago, a friend and I sat down in a pub in Shrewsbury, drank some beer, and chatted about the web. Every month since, people have been doing that in Shrewsbury (and a few times in Ludlow). It’s called ShropGeek (we’re very savvy in our naming conventions, you see). It was started and organised […]

Newspapers, advertising, revenue, innovation

A couple weeks ago, I joined the 110-year-old WAN-IFRA at their annual Digital Media Conference at the swish ETCVenues’ 200 Aldersgate London pad. The organisation has become the voice for the worldwide community of newspaper publishers, and the DMC was a truly international affair with 37 countries from all five continents represented. Senior executives see […]

Internships – coding and data science

The last two summers, we had a really good intern (Aidan Hobson Sayers – thanks for finding him for us, John!). We’d like to do it again this year. We’ve opportunities in three areas, depending on your skills and interests. Platform team – CoffeeScript, Backbone, Unix. We use Extreme Programming. Data science team – Python, R. Scraping, […]

A sea of data

My friend Simon Holgate of Sea Level Research has recently “cursed” me by introducing me to tides and sea-level data. Now I’m hooked. Why are tides interesting? When you’re trying to navigate a super-tanker into San Francisco Bay and you only have few centimetres of clearance, whether the tide is in or out could be […]

Summarise #2: Pies and facts

In a previous blog post, I showed how by counting the most common values in each column (like a pivot table, or “group by” in SQL), I managed to make a tool that can automatically summarise datasets. I quickly realised that there were better ways of visualising the data than just showing tables. For example, […]

data-driven london week

Most mornings this week, I awoke in the mystical land of Hackney, and battled hordes of hipster-cyclists to make my way to the Google Campus – a refuge of data-folk. At least, that’s how I like to remember it. As I blogged last week, several ScraperWikians attended and spoke at a range of events, all […]

ScraperWiki

Extract tables from PDFs and scrape the web

Blog