New backend now fully rolled out

The new faster, safer sandbox that powers ScraperWiki is now fully rolled out to all users. You should find running and developing scrapers and views faster than before, and that you’re using much more recent versions of Ruby, Python and associated libraries. Thank you to everyone, and there were lots of you, who helped us beta […]

#media2012: Hacking the Olympics

Last weekend, Scraperwiki hosted a ‘hacks and hackers’ event at FACT in Liverpool as part of Abandon Normal Devices (AND) Festival, focusing on scraping data related to the Olympics for the #media2012 network. There is plenty of information, plenty of activity and plenty of action that is happening and can be done in order to […]

Scraping guides: Parsing HTML using CSS selectors

We’ve added a new scraping copy-and-paste guide, so you can quickly get the lines of code you need to parse an HTML file using CSS selectors. Get to it from the documentation page: The HTML parsing guide is available in Ruby, Python and PHP. Just as with all documentation, you can choose which at the top right […]

‘Big Data’ in the Big Apple

My colleague @frabcus captured the main theme of Strata New York #strataconf in his most recent blog post. This was our first official speaking engagement in the USA as a Knight News Challenge 2011 winner. Here is my twopence worth! At first we were a little confused at the way in which the week long […]

Four data trends to rule them all, the data scientist king to bind them

My favourite soundbite from O’Reilly’s Strata data conference was a definition of big data. John Rauser, Amazon’s main data scientist, said to me that “data is big data when you can’t process it on one machine”. And naturally, small data is data that you can process on one machine. What’s nice about this definition is it […]

Start Talking to Your Data – Literally!

Because ScraperWiki has a SQL database and an API with SQL extraction, I can SQL inject (haha!) straight into the API URL and use the JSON output. So what does all that mean? I scraped the CSV files of Special Advisers’ meetings gifts and hospitalities at Number 10. This is being updated as the data […]

Make RSS with an SQL query

Lots of people have asked for it to be easier to get data out of ScraperWiki as RSS feeds. The Julian has made it so. The Web API now has an option to make RSS feeds as a format (i.e. instead of JSON, CSV or HTML tables). For example, Anna made a scraper that gets alocohol […]

Help Get Olympic Data off the Start Line

As part of Media2012 we’ll be running (no pun intended) a Hacks and Hackers Data Journalism workshop. It’s part of the Abandon Normal Devices Festival. It’ll be on 2nd October from 11:00-17:00 at FACT (Foundation for Art and Creative Technology) Medialab, 88 Wood Street, Liverpool, L1 4DQ. So if you’re interested in sports data and want […]

Driving the Digger Down Under

G’day, Henare here from the OpenAustralia Foundation – Australia’s open data, open government and civic hacking charity. You might have heard that we were planning to have a hackfest here in Sydney last weekend. We decided to focus on writing new scrapers to add councils to our PlanningAlerts project that allows you to find out […]

Scraping guides: Excel spreadsheets

Following on from the CSV scraping guide, we’ve now added one about scraping Excel spreadsheets. You can get to them from the documentation page. The Excel scraping guide is available in Ruby, Python and PHP. Just as with all documentation, you can choose which at the top right of the page. As with CSV files, at first […]

ScraperWiki

Extract tables from PDFs and scrape the web

Blog