hacks and hackers hack day – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 A Bonny Wee Hack Day at #hhhglas https://blog.scraperwiki.com/2011/03/a-bonny-wee-hack-day-at-hhhglas/ https://blog.scraperwiki.com/2011/03/a-bonny-wee-hack-day-at-hhhglas/#comments Mon, 28 Mar 2011 16:42:47 +0000 http://blog.scraperwiki.com/?p=758214491 For our first venture to Scotland where better to be than BBC Scotland! We had 8 teams of hacks and hackers digging around the Scottish data beat. For this very special occasion the ScraperWiki digger has donned tartan! With this special digger, fire incidents, planning applications, public-owned property and gifts councillors’ received have been mined. Here’s a word from our Francis Irving:


Now check out the projects:

Fire Bugs – This project scrapes the data from the Central Scotland Fire Service’s Recorded Incidents log, creating an alert when new incidents are logged. It also retrieves historic data.

The team consisted of 1 hack (Chris Sleight, from BBC Scotland) and 2 hackers (Ben Lyons and Paul Miller, from IRISS).

Central Scotland Fire Service put a lot of data on their website but as is usual, it was not in a very useful form. 60 incidents are put on the site but if you dig down you get over 15,000 buried records. For one day’s work, Fire Bugs scraped the records and decided to look at malicious false alarms. Luckily for them the language and structure of the records were consistent. They found that 3.5% of all calls were malicious false alarms. They even made a tree map on ScrpaerWiki using protoviz.  Fire Bugs have clearly opened up a huge amount of potential with this data.

Edinburgh Planning App Map – This is Edinburgh’s first automated map of local planning applications! This is a popular theme for our hack days and on ScraperWiki in general. Open Australia are using ScraperWiki for their planning alerts.

This team consists of 1 hack (Michael MacLeod, beatblogger for Guardian Edinburgh on the right) and one hacker (Robert McWilliam, from Blueflow on the left).

As Michael MacLeod pointed out, people dont’ know how to use the local council website. You can’t just type in your postcode to find applications near you. There’s a map online but it’s truly awful! The team scraped the site and made a map which updates everyday rather than just every week like on the council site. Michael used this new tool to take a closer look at his beat and found a planning application for urban paintball. What he duly noted was that the Facebook page was trying to be secretive about the location! Using the map, he found it was going to be right behind a block of flats. He wil be talking to residents!

Hide by the Clyde – This project creates a map to allow the user to compare exam results in different areas and correlates this against measures of social deprivation.

The team consists of 1 hack (Bruce Munro from BBC Scotland) and 3 hackers (Nicola Osborne from Edina, Sean Carroll from BBC Scotland and Bob Kerr from Open Street Map).

They looked at data from Learning and Teaching Scotland and scraped the search for schools form. Here is the map. From this freed data they were able to make a heat map of free school meals registration in Scotland and compare education statistics between Glasgow and Argyll & Bute for example. A major project would be to put all this information on one site in a user friendly format.

Public Buildings for Sale – is a tool to show all publicly-owned property that is for sale/rent. This will be a Scottish sister to ScraperWiki’s brownfield’s sites map. The project aims to answer the question: How much public land is being sold without our knowledge?

This team consisted of 1 hack (Peter Mackay from BBC Scotland on the right) and 1 hacker (Martyn Inglis from The Guardian on the left).

The data they wanted is on this horrible website on property sales and lettings from Scotland’s public sector. The nested html tables are very difficult to scrape. They managed to scrape this far and plan to remodel the data to make it searchable by postcode. From this, they want to glean more information about council’s buy and sell strategies.

Crash Test Dummies – This project takes three separate approaches to looking at Scottish road accident data.

  1. What accidents get reported?
  2. Are you safe on the roads?
  3. What affect do road safety measures make on your journey

The team consisted of 2 hacks (David Eyre and Brendon Crowther from BBC Scotland) and 2 hackers (Ali Craigmile and Mo McRoberts from BBC Scotland).

In just one day they managed to built a prototype for a “Mind how you go!” BBC Scotland site. They used road traffic accident reports based on 2005-2009 data to create a form that showed how likely you were to survive your journey depending on your age, sex and where you’re going! They built a spreadsheet from even more data so that the site had even more potential to go beyond the records. They also scraped Google searches of reported road traffic accidents and mapped the reports from BBC scotland from 2010.

BME ScotlandThis project aims to find out what are the effects of the recession on education! Is education a route to the ghetto? It aims to compare BME educational achievement with unemployment statistics to find out which areas of Scotland are economic no-gos.

The team consisted of 1 hack (Fin Wycherley) and 1 hacker (Paul McNally).

The lesson learnt here was that sometimes there’s not enough data to go around. What they know is that the African population is doing exceedingly well in education in Scotland. However, they also have a relatively high level of unemployment. The result was a call out for better data collection as none of the information fitted in a way that would help answer the question: Why?.

Edinburgh CouncilThis project searches Edinburgh Councillors gifts and expenses!

The team consists of 2 hacks (Paola Di Maoi and Anand Ramkissoon) and 1 hacker (James Baster).

As it turner out, the Council website is easy to scrape. The structure of the site is consistent and clean. ScraperWiki likes this! And so here is the scraper. As James pointed out, the data needs to be double checked for misspelled entries, etc. But the preliminary data shows that Lothian buses gave the most gifts and Phil Wheeler received the most gifts.

Magners CiderThis project aims to scrape the Magners League Rugby scores. The team consisted of a hack/hacker pair of Paul McNally (again, we love eager hackers!) and Tony Sinclair of BBC Scotland (who had to keep up the day job and so was not around for a picture). Apparently, a graphics operator had to input the information from the site by hand into the graphics system to produce the league tables you see on screen. Seeing as the graphics software can access spreadsheets, Tony thought “Why not automate the process by scraping?”. And this is what they did. So the scores have gone from ScraperWiki to TV!

And the winners are… (drum roll please)

  • 1st Prize: Edinburgh Planning Map App
  • 2nd Prize: Fire Bugs
  • 3rd Prize: Magners Cider
  • Best Scraper: Fire Bugs

A big shout out

Our judges, Jon Jacob from BBC College of Journalism, Allan Donald from STV and Huw Owen, Editor of Good Morning Scotland.

Our sponsors BBC Scotland, BBC College of Journalism and The Guardian Open Platform.

Edinburgh planning applications, fire incidents and Rugby scores – you’ve been ScraperWikied!

The winners and the judges

https://blog.scraperwiki.com/2011/03/a-bonny-wee-hack-day-at-hhhglas/feed/ 5 758214491
Cardiff Hacks and Hackers Hacks Day https://blog.scraperwiki.com/2011/03/cardiff-hacks-and-hackers-hacks-day/ https://blog.scraperwiki.com/2011/03/cardiff-hacks-and-hackers-hacks-day/#comments Tue, 15 Mar 2011 16:12:32 +0000 http://blog.scraperwiki.com/?p=758214441 What’s occurin’? Loads in fact, at our first Welsh Hacks and Hackers Hack Day! From schools from space to catering colleges with a Food Safety Standard of 2, we had an amazing day. Check out the video by Gavin Owen:

We got five teams:

Co-Ordnance – This project aimed to be a local business tracker. They wanted to make the London Stock Exchange code into meaningful data, but alas, the stock exchange prevents scraping. So they decided to use company data from registers like the LSE and Companies House to extract business information and structure it for small businesses who need to know best place to set up and for local business activists.

The team consisted of 3 hacks (Steve Fossey, Eva Tallaksen from Intrafish and Gareth Morlais from BBC Cymru) and 3 hackers (Carey HilesCraig Marvelley and Warren Seymour, all from Box UK).

It’s a good thing they had some serious hackers as they had a serious hack on their hands. Here’s a scraper they did for the London Stock Exchange ticker. And here’s what they were able to get done in just one day!

This was just a locally hosted site but the map did allow users to search for types of businesses by region, see whether they’d been dissolved and by what date.

Open Senedd – This project aimed to be a Welsh version of TheyWorkforYou. A way for people in Wales to find out how assembly members voted in plenary meetings. It tackles the worthy task of making assembly members voting records accessible and transparent.

The team consisted of 2 hacks (Daniel Grosvenor from CLIConline and Hannah Waldram from Guardian Cardiff) and 2 hackers (Nathan Collins and Matt Dove).

They spent the day hacking away and drew up an outline for www.opensenedd.org.uk. We look forward to the birth of their project! Which may or may not look something like this (left). Minus Coke can and laptop hopefully!

They took on a lot for a one day project but devolution will not stop the ScraperWiki digger!

There’s no such thing as a free school meal – This project aimed to extract information on Welsh schools from inspection reports. This involved getting unstructure Estyn reports on all 2698 Welsh schools into ScraperWiki.

The team consisted of 1 hack (Izzy Kaminski) and 2 astronomer hackers (Edward Gomez and Stuart Lowe from LCOGT).

This small team managed to scrape Welsh schools data (which the next team stole!) and had time to make a heat map of schools in Wales. This was done using some sort of astronomical tool. Their longer term aim is to overlay the map with information on child poverty and school meals. A worthy venture and we wish them well.

Ysgoloscope – This project aimed to be a Welsh version of Schooloscope. Its aim was to make accessible and interactive information about schools for parents to explore. It used Edward’s scraper of horrible PDF Estyn inspection reports. These had different rating methodology to Ofsted (devolution is not good for data journalism!).

The team consisted of 6 hacks (Joni Ayn Alexander, Chris Bolton, Bethan James from the Stroke Association, Paul Byers, Geraldine Nichols and Rachel Howells), 1 hacker (Ben Campbell from Media Standards Trust) and 1 troublemaker (Esko Reinikainen).

Maybe it was a case of too many hacks or just trying to narrow down what area of local government to tackle, but the result was a plan. Here is their presentation and I’m sure parents all over Wales are hoping to see Ysgoloscope up and running.

Blasus – This project aimed to map food hygiene rating over Wales. They wanted to correlate this information with deprivation indices. They noticed that the Food Standards Agency site does not work. Not for this purpose which is most useful.

The team consisted of 4 hacks (Joe Goodden from the BBC, Alyson Fielding, Charlie Duff from HRZone and Sophie Paterson from the ATRiuM) and 1 hacker (Dafydd Vaughan from CF Labs).

As you can see below they created something which they presented on the day. They used this scraper and made an interactive map with food hygiene ratings, symbols and local information. Amazing for just a day’s work!

And the winners are… (drum roll please)

  • 1st Prize: Blasus
  • 2nd Prize: Open Senedd
  • 3rd Prize: Co-Ordnance
  • Best Scoop: Blasus for finding  a catering college in Merthyr with a Food Hygiene Standard rating of just 2
  • Best Scraper: Co-Ordnance

A big shout out

To our judges Glyn Mottershead from Cardiff School of Journalism, Media and Cultural Studies, Gwawr Hughes from Skillset and Sean Clarke from The Guardian.

And our sponsors Skillset, Guardian Platform, Guardian Local and Cardiff School of Journalism, Media and Cultural Studies.

Schools, businesses and eating place of Wales – you’ve been ScraperWikied!

Blasus winning first prize and Best Scoop award (prizes will be delivered, sealed with a handshake from our sponsor).

https://blog.scraperwiki.com/2011/03/cardiff-hacks-and-hackers-hacks-day/feed/ 8 758214441
Video: Hacks and Hackers Hack Day Manchester https://blog.scraperwiki.com/2010/10/758213944/ https://blog.scraperwiki.com/2010/10/758213944/#comments Sun, 17 Oct 2010 22:37:46 +0000 http://blog.scraperwiki.com/?p=758213944

Hacks and Hackers Hack Day Manchester at Vision+Media in Salford, on 15th October 2010. Filmed (on a Flip) and edited by Joseph Stashko, who has kindly allowed us to re-publish the video here. A write-up of the day can be found at this link.

https://blog.scraperwiki.com/2010/10/758213944/feed/ 1 758213944
Hacks and Hackers Hack Day Liverpool: Policemen, judges and libraries https://blog.scraperwiki.com/2010/07/hacks-and-hackers-hack-day-liverpool-policemen-judges-and-libraries/ https://blog.scraperwiki.com/2010/07/hacks-and-hackers-hack-day-liverpool-policemen-judges-and-libraries/#comments Fri, 23 Jul 2010 14:15:10 +0000 http://blog.scraperwiki.com/?p=758213713 Last Friday we hosted the second of ScraperWiki’s Hacks and Hackers Hack Days – in Liverpool, sponsored by Liverpool John Moores University Open Labs and the Liverpool Post & Echo. It marked the start of the ScraperWiki UK tour, with plans for events in Leeds, Manchester, Glasgow, Dublin, Belfast, London and Cardiff.*

We had a fantastic turnout, with a mix of programmers and journalists from a variety of backgrounds. We stole a good number from the Liverpool Post & Echo newsroom, who came armed with brilliant ideas for local data mashing.

Teams – both large and small – formed quickly, according to specialism and interests. Then, it was down to the hacking…

We had crime…

Alison Gow, Frank Swain, Sam Sutton, Luke Traynor, Maria Breslin worked on the Life and Alleged Crimes of Pancake Taylor. This visualisation project took the story of one local man’s brush with the law. Using maps and timelines, the eventual result was a web page dedicated to this notorious Liverpool gangster’s (alleged) activities.

Crime prevention…

Julian Todd, Jo Kelly and Joni Alexander  took data from the Merseyside Police website, in order to show when a policeman or woman is removed from the listing of officers covering an area, or added. This project could be rolled out in any local area, using similar data. Read more on Ed’s blog here.

Court case alerts…

Adrian McEwen, Donovan Hide, John O’Shea and Andy Freeney worked on ‘The Gavel’ featuring Judge Duino (Do-eee-no), with the aim of making legal process data tangible.

It took as a starting point the messy information put out by Her Majesty’s Court Service: and attempted to scrape this – making clean, clear information available in real time.  They ended up with something which “pretty much” worked, and since then Donovan has developed it, at http://causelist.org/.

They started to think about new and interesting ways that this data might be interpreted publicly and built an electronically controlled ‘gavel’ which could be triggered in response to different aspects of the data.

John O’Shea said: “I think that this project might be thought of as a very early prototype for a truly public and transparent interface with ‘law’.”

Video of the judge in action at this link [Photos: John O’Shea on Flickr]

Local data mapping…

David Bartlett, Mike Nolan, Neil Morrin, Ben Turner, Dan Kay, Martin Dunschen, Tori Hywel-Davies, Paul Freeman, Dan Owen and Kevin Matthews scraped local data sets to do with health, education and transport for a series of Merseyside maps. The project was to create a map packed with local information eg. schools, GP surgeries, train stations, etc. They managed to scrape information from Liverpool PCT for GPs, the National Rail website for stations, and the department of education’s site for schools.

They found that using Google Earth was the only way to get it all on one map. For the project to really work and become useful with more information added, a new map interface would be needed to allow users to select what information they wanted displayed, says team member David Bartlett. The team’s presentation can be viewed here.


The ‘Business Light’ by Mark Thomas, Francis Irving, Aidan McGuire, Ben Schofield, Alistair Houghton, Laurence Rowe and Tom Mortimer-Jones was a dashboard for watching business activity in Merseyside – allowing users to make informed business decisions through a traffic light ranking system.  They protoyped it, checked what data they could get (employment levels, insolvancies, contracts etc.), and worked out what the website would do. It also involved visualisations and screen scraping.


In ‘Library Data: What’s the Story?’ (originally: ‘why aren’t libraries more like Amazon?’) Ben Webb, Anna Powell-Smith and Mandy Phillips followed up a story on closed data in libraries. UK libraries generally have proprietary catalogue systems without public APIs. As a result, libraries have to pay for access to their own data, and users can’t share records easily. They found some sample open RDF data from one library provider, and built a prototype for an open UK-wide catalogue search. Find the presentation at this link.


Jamie Bowman, Francine Higham, John McKerrell, Neil Macdonald and Francis Fish tackled the Other World Cup.

This was the World Cup’s alternative story. A visualisation showed stats that the media weren’t focusing on: the number of people displaced; and the chance of England winning, for example.

Meanwhile, Adrian McEwen’s lovely #hhhliv the @bubblino machine releases bubbles as we tweet on TwitpicBubblino machine tweeted bubbles everytime the hashtag #hhhliv was uttered on Twitter.

The winners of the day, as judged by Jane Clare, executive editor of Trinity Mirror’s Merseyside Weeklies, lawyer Steve Kuncewicz, and Lindsay Sharples, director of LJMU Open Labs:

  • First: The Business Light
  • Second: Why aren’t libraries more like Amazon?

We’d like to say a big thank you to our sponsors for hosting, feeding and rewarding our hard working participants; and congratulations to all involved in the day. Thank you to all the hacks and hackers who supplied information for this blog post.

What they said…

“I’ve just had one of the best working days you could wish for…” Alison Gow, executive editor, digital, Liverpool Post & Echo.

“I’m still fascinated by #scraperwiki and #hhhliv. I should investigate more,” @defnetmedia on Twitter.

“Great day at #hhhliv trying to visually represent costs of #Worldcup. Trying to to take this further as lots more info emerges in future months,” @fransa on Twitter.

“Good day #hhhliv. Learned a lot from some very smart people,”  @ed_walker86 on Twitter.

“What impressed me most about the event was the total commitment of all of those present to be involved in the process and deliver a fresh idea,” John O’Shea, artist.

Blog coverage

*Locations may be added or removed, depending on interest. If you would like to talk to us about getting involved in these events, as a partner or sponsor, please contact judith [at] scraperwiki.com.

https://blog.scraperwiki.com/2010/07/hacks-and-hackers-hack-day-liverpool-policemen-judges-and-libraries/feed/ 11 758213713