guardian – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 Cardiff Hacks and Hackers Hacks Day https://blog.scraperwiki.com/2011/03/cardiff-hacks-and-hackers-hacks-day/ https://blog.scraperwiki.com/2011/03/cardiff-hacks-and-hackers-hacks-day/#comments Tue, 15 Mar 2011 16:12:32 +0000 http://blog.scraperwiki.com/?p=758214441 What’s occurin’? Loads in fact, at our first Welsh Hacks and Hackers Hack Day! From schools from space to catering colleges with a Food Safety Standard of 2, we had an amazing day. Check out the video by Gavin Owen:

We got five teams:

Co-Ordnance – This project aimed to be a local business tracker. They wanted to make the London Stock Exchange code into meaningful data, but alas, the stock exchange prevents scraping. So they decided to use company data from registers like the LSE and Companies House to extract business information and structure it for small businesses who need to know best place to set up and for local business activists.

The team consisted of 3 hacks (Steve Fossey, Eva Tallaksen from Intrafish and Gareth Morlais from BBC Cymru) and 3 hackers (Carey HilesCraig Marvelley and Warren Seymour, all from Box UK).

It’s a good thing they had some serious hackers as they had a serious hack on their hands. Here’s a scraper they did for the London Stock Exchange ticker. And here’s what they were able to get done in just one day!

This was just a locally hosted site but the map did allow users to search for types of businesses by region, see whether they’d been dissolved and by what date.

Open Senedd – This project aimed to be a Welsh version of TheyWorkforYou. A way for people in Wales to find out how assembly members voted in plenary meetings. It tackles the worthy task of making assembly members voting records accessible and transparent.

The team consisted of 2 hacks (Daniel Grosvenor from CLIConline and Hannah Waldram from Guardian Cardiff) and 2 hackers (Nathan Collins and Matt Dove).

They spent the day hacking away and drew up an outline for www.opensenedd.org.uk. We look forward to the birth of their project! Which may or may not look something like this (left). Minus Coke can and laptop hopefully!

They took on a lot for a one day project but devolution will not stop the ScraperWiki digger!

There’s no such thing as a free school meal – This project aimed to extract information on Welsh schools from inspection reports. This involved getting unstructure Estyn reports on all 2698 Welsh schools into ScraperWiki.

The team consisted of 1 hack (Izzy Kaminski) and 2 astronomer hackers (Edward Gomez and Stuart Lowe from LCOGT).

This small team managed to scrape Welsh schools data (which the next team stole!) and had time to make a heat map of schools in Wales. This was done using some sort of astronomical tool. Their longer term aim is to overlay the map with information on child poverty and school meals. A worthy venture and we wish them well.

Ysgoloscope – This project aimed to be a Welsh version of Schooloscope. Its aim was to make accessible and interactive information about schools for parents to explore. It used Edward’s scraper of horrible PDF Estyn inspection reports. These had different rating methodology to Ofsted (devolution is not good for data journalism!).

The team consisted of 6 hacks (Joni Ayn Alexander, Chris Bolton, Bethan James from the Stroke Association, Paul Byers, Geraldine Nichols and Rachel Howells), 1 hacker (Ben Campbell from Media Standards Trust) and 1 troublemaker (Esko Reinikainen).

Maybe it was a case of too many hacks or just trying to narrow down what area of local government to tackle, but the result was a plan. Here is their presentation and I’m sure parents all over Wales are hoping to see Ysgoloscope up and running.

Blasus – This project aimed to map food hygiene rating over Wales. They wanted to correlate this information with deprivation indices. They noticed that the Food Standards Agency site does not work. Not for this purpose which is most useful.

The team consisted of 4 hacks (Joe Goodden from the BBC, Alyson Fielding, Charlie Duff from HRZone and Sophie Paterson from the ATRiuM) and 1 hacker (Dafydd Vaughan from CF Labs).

As you can see below they created something which they presented on the day. They used this scraper and made an interactive map with food hygiene ratings, symbols and local information. Amazing for just a day’s work!

And the winners are… (drum roll please)

  • 1st Prize: Blasus
  • 2nd Prize: Open Senedd
  • 3rd Prize: Co-Ordnance
  • Best Scoop: Blasus for finding  a catering college in Merthyr with a Food Hygiene Standard rating of just 2
  • Best Scraper: Co-Ordnance

A big shout out

To our judges Glyn Mottershead from Cardiff School of Journalism, Media and Cultural Studies, Gwawr Hughes from Skillset and Sean Clarke from The Guardian.

And our sponsors Skillset, Guardian Platform, Guardian Local and Cardiff School of Journalism, Media and Cultural Studies.

Schools, businesses and eating place of Wales – you’ve been ScraperWikied!

Blasus winning first prize and Best Scoop award (prizes will be delivered, sealed with a handshake from our sponsor).


]]>
https://blog.scraperwiki.com/2011/03/cardiff-hacks-and-hackers-hacks-day/feed/ 8 758214441
Read all about it read all about it: “ScraperWiki gets on the Guardian front page…” https://blog.scraperwiki.com/2011/02/read-all-about-it-read-all-about-it-scraperwiki-gets-on-the-guardian-front-page/ https://blog.scraperwiki.com/2011/02/read-all-about-it-read-all-about-it-scraperwiki-gets-on-the-guardian-front-page/#comments Fri, 25 Feb 2011 15:06:51 +0000 http://blog.scraperwiki.com/?p=758214338 A data driven story by investigative journalist James Ball on lobbyist influence in the UK Parliament has made it on to the front page of the Guardian. What is exciting for us is that James Ball’s story is helped and supported by a ScraperWiki script that took data from registers across parliament that is located on different servers and aggregates them into one source table that can be viewed in a spreadsheet or document.  This is now a living source of data that can be automatically updated.  http://scraperwiki.com/scrapers/all_party_groups/

For the past year the team at ScraperWiki has been running media events around the country. Our next one is in Cardiff and fully subscribed; we also have an event at BBC Scotland in Glasgow on 25 March.   Throughout the programme we have had the opportunity to meet great journalists and bloggers from national and local press so we always thought we would make it to the front page –  we just didn’t know when or by whom.

The story demonstrates the potential power of ScraperWiki to help journalists and researchers join the dots efficiently by collaboratively working with data specialists and software systems. Journalists can put down markers that run and update automatically and they can monitor the data over time with the objective of holding ‘power and money’ to account. The added value  of this technique is that in one step the data is represented in a uniform structure and linked to the source thus ensuring its provenance.  The software code that collects the data can be inspected by others in a peer review process to ensure the fidelity of the data.

In addition and because of the collaborative and social nature of the platform there is also the potential to involve others in the wider technical and data community to continue to improve the data.  Since the data is delivered using a scheduled script that runs daily  – journalists and interested parties can now subscribe to the data set for future changes and amendments.  So, for example, a journalist interested in any influence by a company, such as Virgin, can now have a specific email alert for donations or other actions by the conglomerate.

We know and understand that data in the media sector needs to be kept embargoed until the story breaks.  Next month we will be launching an opportunity for data consumers to request and subscribe to specific data feeds.

There is a tsunami of data being published and its increasingly hard for investigative journalists to find the time to sift through the masses of information and to make sense of it.  We believe that ScraperWiki helps to solve some of the ‘hard’ data issues that people in the media face on a daily basis.

Congratulations to James on his front page story and to the fantastic team at the Guardian who do fabulous work on open data and data driven journalism – long may it continue!

]]>
https://blog.scraperwiki.com/2011/02/read-all-about-it-read-all-about-it-scraperwiki-gets-on-the-guardian-front-page/feed/ 3 758214338