Quickly get an HTML table

As I said before the Julian sometimes adds simple things to ScraperWiki, and nobody even notices them. He obviously learnt, as he ticketed in BitBucket that we needed to release this one. The External API can output data in several formats. jsondict is the standard JSON format using a dictionary for each row, jsonlist an alternative JSON format […]

Knight Foundation finance ScraperWiki for journalism

ScraperWiki is the place to work together on data, and it is particularly useful for journalism. We are therefore very pleased to announce that ScraperWiki has won the Knight News Challenge! The Knight Foundation are spending $280,000 over 2 years for us to improve ScraperWiki as a platform for journalists, and to run events to bring together journalists […]

Protect your scrapers!

You know how it is. You wrote your scraper on a whim. Because it’s a wiki, some other people found it, and helped fix bugs in it and extend it. Time passes. And now your whole business depends on it. For when that happens, we’ve just pushed an update that lets you protect scrapers. This […]

Why the Government scraped itself

We wrote last month about Alphagov, the Cabinet Office’s prototype, more usable, central Government website. It made extensive use of ScraperWiki. The question everyone asks – why was the Government scraping its own sites? Let’s take a look. In total 56 scrapers were used. You can find them tagged “alphagov” on the ScraperWiki website. There are a […]

‘Documentation is like sex: when it is good, it is very, very good; and when it is bad, it is better than nothing’

You may have noticed that the design of the ScraperWiki site has changed substantially. As part of that, we made a few improvements to the documentation. Lots of you told us we had to make our documentation easier to find, more reliable and complete. We’ve reorganised it all under one contents page, called Documentation throughout […]

All recipes 30 minutes to cook

The other week we quietly added two tutorials of a new kind to the site, snuck in behind a radical site redesign. They’re instructive recipes, which anyone with a modicum of programming knowledge should be able to easily follow. 1. Introductory tutorial For programmers new to ScraperWiki, to a get an idea of what it […]

It’s SQL. In a URL.

Squirrelled away amongst the other changes to ScraperWiki’s site redesign, we made substantial improvements to the external API explorer. We’re going to concentrate on the SQLite function here as it is most import, but as you can see on the right there are other functions for getting out scraper metadata. Zarino and Julian have made […]

ScraperWiki: A story about two boys, web scraping and a worm

“It’s like a buddy movie.” she said. Not quite the kind of story lead I’m used to. But what do you expect if you employ journalists in a tech startup? “Tell them about that computer game of his that you bought with your pocket money.” She means the one with the risqué name. I think I’d […]

See a scraper you like? Fork it!

We’ve just pushed a new feature to let you “fork” scrapers and views. That is, you can quickly and easily make a copy so you can make radical changes without interfering with the original, or to make customisations for your own use. We’ve used a red pitchfork for the icon, because amongst computer programmers forking […]

New Ruby scraping tutorials – PDFs and Mechanize

Got a PDF you want to get data from?Try our easy web interface over at pdftables.com! Mark Chapman has made us two new Ruby tutorials. Advanced Scraping: Pages Behind Forms shows you how to get data that is buried behind search boxes and drop down query lists. It uses the Mechanize library, which is a class […]

ScraperWiki

Extract tables from PDFs and scrape the web

Archive by Author