A Bonny Wee Hack Day at #hhhglas

For our first venture to Scotland where better to be than BBC Scotland! We had 8 teams of hacks and hackers digging around the Scottish data beat. For this very special occasion the ScraperWiki digger has donned tartan! With this special digger, fire incidents, planning applications, public-owned property and gifts councillors’ received have been mined. […]

OpenCorporates partners with ScraperWiki & offers bounties for open data scrapers

This is a guest post by Chris Taggart, co-founder of OpenCorporates When we started OpenCorporates it was to solve a real need that we and a number of other people in the open data community had: whether it’s Government spending, subsidy info or court cases, we needed a database of corporate entities to match against, and […]

ScraperWiki-ing Down Under

Streuth! You never know what’s been drilling around on ScraperWiki. If you’ve been too busy hacking away on your own projects you probably haven’t noticed a major undertaking right here on our wiki. Open Australia have made their planning alerts scrapers on the site and we’d like to take this moment to say: “G’day”. PlanningAlerts.org.au […]

Cardiff Hacks and Hackers Hacks Day

What’s occurin’? Loads in fact, at our first Welsh Hacks and Hackers Hack Day! From schools from space to catering colleges with a Food Safety Standard of 2, we had an amazing day. Check out the video by Gavin Owen: We got five teams: Co-Ordnance – This project aimed to be a local business tracker. […]

See a scraper you like? Fork it!

We’ve just pushed a new feature to let you “fork” scrapers and views. That is, you can quickly and easily make a copy so you can make radical changes without interfering with the original, or to make customisations for your own use. We’ve used a red pitchfork for the icon, because amongst computer programmers forking […]

600 Lines of Code, 748 Revisions = A Load of Bubbles

When Channel 4’s Dispatches came across 1,100 pages of PDFs, known as the National Asset Register, they knew they had a problem on their hands. All that data, caged in a pixelated prison. So ScraperWiki let loose ‘The Julian’. What ‘The Stig’ is to Top Gear, ‘The Julian’ is to ScraperWiki. That and our CTO. […]

New Ruby scraping tutorials – PDFs and Mechanize

Got a PDF you want to get data from?Try our easy web interface over at pdftables.com! Mark Chapman has made us two new Ruby tutorials. Advanced Scraping: Pages Behind Forms shows you how to get data that is buried behind search boxes and drop down query lists. It uses the Mechanize library, which is a class […]

Sometimes you only need HTML / Javascript

Sometimes Julian adds simple things to ScraperWiki, and hardly finds it worth telling anybody about them. For a while now, you’ve been able to create views, to get your data out of ScraperWiki just how you want, in Python, Ruby and PHP. Julian has added a new option for HTML. This is useful because you can […]

Read all about it read all about it: “ScraperWiki gets on the Guardian front page…”

A data driven story by investigative journalist James Ball on lobbyist influence in the UK Parliament has made it on to the front page of the Guardian. What is exciting for us is that James Ball’s story is helped and supported by a ScraperWiki script that took data from registers across parliament that is located […]

Libraries that are ready and waiting

One of the nice things about ScraperWiki is that it gives you access to all sorts of useful data manipulation libraries, all from your web browser. Everything from PDF extractors, to statistical analysis. We’ve now documented what is available – in Python, in Ruby and in PHP. Here’s a screenshot of a few for a […]

ScraperWiki

Extract tables from PDFs and scrape the web

Blog