ScraperWiki Digger Gets HTTPS Security System

During the last week of rioting across the UK 8 riot vans were called out to quel the unrest just around the corner from where I live. With scenes of chaos and destruction filling the airwaves and clogging up twitter I begun thinking: Are your scrapers safe from looters? We don’t stock trainers or flat-screen TVs, […]

Quickly get an HTML table

As I said before the Julian sometimes adds simple things to ScraperWiki, and nobody even notices them. He obviously learnt, as he ticketed in BitBucket that we needed to release this one. The External API can output data in several formats. jsondict is the standard JSON format using a dictionary for each row, jsonlist an alternative JSON format […]

meine-demokratie.de at the Open Knowledge Conference’s ScraperWiki Workshop

This is a guest blog post from Tobias Escher, a star ScraperWiki user in Germany. This year’s Open Knowledge Conference takes place in Berlin. While the conference proper starts tomorrow, during the last two days ScraperWiki ran a workshop to scrape and visualise German data. It has drawn a sizeable community of people, from novices […]

Protect your scrapers!

You know how it is. You wrote your scraper on a whim. Because it’s a wiki, some other people found it, and helped fix bugs in it and extend it. Time passes. And now your whole business depends on it. For when that happens, we’ve just pushed an update that lets you protect scrapers. This […]

‘Documentation is like sex: when it is good, it is very, very good; and when it is bad, it is better than nothing’

You may have noticed that the design of the ScraperWiki site has changed substantially. As part of that, we made a few improvements to the documentation. Lots of you told us we had to make our documentation easier to find, more reliable and complete. We’ve reorganised it all under one contents page, called Documentation throughout […]

All recipes 30 minutes to cook

The other week we quietly added two tutorials of a new kind to the site, snuck in behind a radical site redesign. They’re instructive recipes, which anyone with a modicum of programming knowledge should be able to easily follow. 1. Introductory tutorial For programmers new to ScraperWiki, to a get an idea of what it […]

It’s SQL. In a URL.

Squirrelled away amongst the other changes to ScraperWiki’s site redesign, we made substantial improvements to the external API explorer. We’re going to concentrate on the SQLite function here as it is most import, but as you can see on the right there are other functions for getting out scraper metadata. Zarino and Julian have made […]

ScraperWiki: A story about two boys, web scraping and a worm

“It’s like a buddy movie.” she said. Not quite the kind of story lead I’m used to. But what do you expect if you employ journalists in a tech startup? “Tell them about that computer game of his that you bought with your pocket money.” She means the one with the risqué name. I think I’d […]

Hacks & Hackers Glasgow: the BBC College of Journalism video

Last month we celebrated the final leg of our UK & Ireland Hacks & Hackers tour in Glasgow, at an event hosted by BBC Scotland and supported by BBC College of Journalism and Guardian Open Platform. You can read more about it here. Other coverage includes: Guardian Local (Edinburgh) Guardian Developer blog The BBC College […]

Scrape it – Save it – Get it

I imagine I’m talking to a load of developers. Which is odd seeing as I’m not a developer. In fact, I decided to lose my coding virginity by riding the ScraperWiki digger! I’m a journalist interested in data as a beat so all I need to do is scrape. All my programming will be done […]

ScraperWiki

Extract tables from PDFs and scrape the web

Archive | Developer