open labs – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 Student scraping in Liverpool: football figures and flying police https://blog.scraperwiki.com/2010/12/student-scraping-in-liverpool-football-figures-and-flying-police/ https://blog.scraperwiki.com/2010/12/student-scraping-in-liverpool-football-figures-and-flying-police/#comments Thu, 23 Dec 2010 12:50:27 +0000 http://blog.scraperwiki.com/?p=758214157 A final Hacks & Hackers report to end 2010! Happy Christmas from everyone at ScraperWiki!

Earlier this month ScraperWiki put on its first ever student event, at Liverpool John Moores University in partnership with Open Labs for students from both LJMU’s School of Journalism and the School of Computing & Mathematical Sciences, as well as external participants. This fabulous video comes courtesy of the Hatch. Alison Gow, digital executive editor at the Liverpool Daily Post and the Liverpool Echo has kindly supplied us with the words (below the video).

Report: Hacks and Hackers Hack Day – student edition

By Alison Gow

At the annual conference of the Society of Editors, held in Glasgow in November, there was some debate about journalist training and whether journalism students currently learning their craft on college courses were a) of sufficient quality and b) likely to find work.

Plenty of opinions were presented as facts and there seemed to be no recognition that today’s students might not actually want to work for mainstream media once they graduated – with their varied (and relevant) skill sets they may have very different (and far more entrepreneurial) career plans in mind.

Anyway, that was last month. Scroll forward to December 8 and a rather more optimistic picture of the future emerges. I got to spend the day with a group of Liverpool John Moores University student journalists, programmers and lecturers, local innovators and programming experts, and it seemed to me that the students were going to do just fine in whatever field they eventually chose.

This was Hacks Meet Hackers (Students) – the first event that ScraperWiki (Liverpool’s own scraping and data-mining phenomenon that has done so much to facilitate collaborative learning projects between journalists and coders) had held for students. I was one of four Trinity Mirror journalists lucky enough to be asked along too.

Brought into being through assistance from the excellent LJMU Open Labs team, backed by LJMU journalism lecturer Steve Harrison, #hhhlivS as it was hashtagged was a real eye-opener. It wasn’t the largest group to attend a ScraperWiki hackday I suspect, but I’m willing to bet it was one of the most productive; relevant, viable projects were crafted over the course of the day and I’d be surprised if they didn’t find their way onto the LJMU Journalism news website in the near future.

The projects brought to the presentation room at the end of the day were:

  • The Class Divide: Investigating the educational background of Britain’s MPs
  • Are Police Helicopters Effective in Merseyside?
  • Football League Attendances 1980-2010
  • Sick of School: The link between ill health and unpopular schools

The prize for Idea With The Most Potential went to the Police Helicopters project. This group had used a sample page from Merseyside Police helicopter movements report, which showed time of flight, geography, outcome and duration. They also determined that of the 33% of solved crimes, 0.03% involved the helicopter. Using the data scraped for helicopter flights, and comparing it to crimes and policing costs data, the group extrapolated it cost £1,675 per hour to fly the helicopter (amounting to more than £100,000 a month), and by comparing it to average officer salaries projected this could fund recruitment of 30 extra police officers. The team also suggested potential spin-off ideas around the data.

The Best Use of Data went to the Football League Figures team an all-male bunch of journos and student journos aided by hacker Paul Freeman who scraped data of every Football League club and brought it together into a database that could be used to show attendance trends. These included the dramatic drop in Liverpool FC attendances during the Thatcher years and the rises that coincided with exciting new signings, plunging attendances for Manchester City and subsequent spikes during takeovers, and the affects of promotion and relegation Premier League teams. The team suggested such data could be used for any number of stories, and would prove compelling information for statistics-hungry fans.

The Most Topical project went to the Class Divide group – LJMU students who worked with ScraperWiki’s Julian Todd to scrape data from the Telegraph’s politics web section and investigate the educational backgrounds of MPs. The group set out to investigate whether parliament consisted mainly of privately-educated elected members. The group said the data led them to discover most Lib Dem MPs were state educated, and that there was no slant of figures between state and privately educated MPs, contrary to what might have been expected. They added the data they had uncovered would prove particularly interesting once the MPs’ vote was held on University tuition fees.

The Best Presentation and the Overall Winner of the hackday went to Sick of Schools by Scraping The Barrel – a team of TM journos and students, hacker Brett and student nurse Claire Sutton – who used Office for National Statistics, Census, council information, and scraped data from school prospectuses and wards to investigate illness data and low demand for school places in Sefton borough. By overlaying health data with school places demand they were able to highlight various outcomes which they believed would be valuable for a range of readers, from parents seeking school places to potential house buyers.

Paul Freeman, described in one tweet as the “the Johan Cruyff of football data scraping” was presented with a Scraperwiki mug as the Hacker of the Day, for his sterling work on the Football League data.

Judges Andy Goodwin, of Open Labs, and Chris Frost, head of the Journalism department, praised everyone for their efforts and Aine McGuire, of ScraperWiki, highlighted the great quality of the ideas, and subsequent projects.  It was a long day but it passed incredibly quickly – I was really impressed not only by the ideas that came out but by the collaborative efforts between the students on their projects.

From my experience of the first Hacks Meet Hackers Day (held, again with support from Open Labs, in Liverpool last summer) there was quite a competitive atmosphere not just between the teams but even within teams as members – usually the journalists – pitched their ideas as the ones to run with. Yesterday was markedly less so, with each group working first to determine whether the data supported their ideas, and adapting those projects depending on what the information produced, rather than having a complete end in sight before they started. Maybe that’s why the projects that emerged were so good.

The Liverpool digital community is full of extraordinary people doing important, innovative work (and who don’t always get the credit they deserve). I first bumped into Julian and Aidan as they prepared to give a talk at a Liver and Mash libraries event earlier this year – I’d never heard of ScraperWiki and I was bowled over by the possibilities they talked about (once I got my brain around how it worked). Since then team has done so much to promote the cause of open data, data journalism, the opportunities it can create, and the worth and value it can have for audiences; Scraperwiki hackdays are attended by journalists from all media across the UK, eager to learn more about data-scraping and collaborative projects with hackers.

With the Hacks Meet Hackers Students day, these ideas are being brought into the classroom, and the outcome can only benefit the colleges, students and journalism in the future. It was a great day, and the prospects for the future are exciting.

Watch this space for more ScraperWiki events in 2011!

]]>
https://blog.scraperwiki.com/2010/12/student-scraping-in-liverpool-football-figures-and-flying-police/feed/ 1 758214157
Video: Liverpool Hacks and Hackers Hack Day https://blog.scraperwiki.com/2010/08/video-liverpool-hacks-and-hackers-hack-day/ https://blog.scraperwiki.com/2010/08/video-liverpool-hacks-and-hackers-hack-day/#comments Wed, 11 Aug 2010 13:07:27 +0000 http://blog.scraperwiki.com/?p=758213772 Liverpool John Moores University Open Labs has just released a video of our Hacks Meet Hackers event that took place in Liverpool last month.

The video gives a flavour of what happened when journalist, bloggers, software developers and artists came together to work on interesting and novel ways of exploring and using public data. You can read a roundup of the day at this link.

Video produced by The Hatch on behalf of Open Labs:

]]>
https://blog.scraperwiki.com/2010/08/video-liverpool-hacks-and-hackers-hack-day/feed/ 7 758213772
Hacks and Hackers Hack Day Liverpool: Policemen, judges and libraries https://blog.scraperwiki.com/2010/07/hacks-and-hackers-hack-day-liverpool-policemen-judges-and-libraries/ https://blog.scraperwiki.com/2010/07/hacks-and-hackers-hack-day-liverpool-policemen-judges-and-libraries/#comments Fri, 23 Jul 2010 14:15:10 +0000 http://blog.scraperwiki.com/?p=758213713 Last Friday we hosted the second of ScraperWiki’s Hacks and Hackers Hack Days – in Liverpool, sponsored by Liverpool John Moores University Open Labs and the Liverpool Post & Echo. It marked the start of the ScraperWiki UK tour, with plans for events in Leeds, Manchester, Glasgow, Dublin, Belfast, London and Cardiff.*

We had a fantastic turnout, with a mix of programmers and journalists from a variety of backgrounds. We stole a good number from the Liverpool Post & Echo newsroom, who came armed with brilliant ideas for local data mashing.

Teams – both large and small – formed quickly, according to specialism and interests. Then, it was down to the hacking…

We had crime…

Alison Gow, Frank Swain, Sam Sutton, Luke Traynor, Maria Breslin worked on the Life and Alleged Crimes of Pancake Taylor. This visualisation project took the story of one local man’s brush with the law. Using maps and timelines, the eventual result was a web page dedicated to this notorious Liverpool gangster’s (alleged) activities.

Crime prevention…

Julian Todd, Jo Kelly and Joni Alexander  took data from the Merseyside Police website, in order to show when a policeman or woman is removed from the listing of officers covering an area, or added. This project could be rolled out in any local area, using similar data. Read more on Ed’s blog here.

Court case alerts…

Adrian McEwen, Donovan Hide, John O’Shea and Andy Freeney worked on ‘The Gavel’ featuring Judge Duino (Do-eee-no), with the aim of making legal process data tangible.

It took as a starting point the messy information put out by Her Majesty’s Court Service: and attempted to scrape this – making clean, clear information available in real time.  They ended up with something which “pretty much” worked, and since then Donovan has developed it, at http://causelist.org/.

They started to think about new and interesting ways that this data might be interpreted publicly and built an electronically controlled ‘gavel’ which could be triggered in response to different aspects of the data.

John O’Shea said: “I think that this project might be thought of as a very early prototype for a truly public and transparent interface with ‘law’.”

Video of the judge in action at this link [Photos: John O’Shea on Flickr]

Local data mapping…

David Bartlett, Mike Nolan, Neil Morrin, Ben Turner, Dan Kay, Martin Dunschen, Tori Hywel-Davies, Paul Freeman, Dan Owen and Kevin Matthews scraped local data sets to do with health, education and transport for a series of Merseyside maps. The project was to create a map packed with local information eg. schools, GP surgeries, train stations, etc. They managed to scrape information from Liverpool PCT for GPs, the National Rail website for stations, and the department of education’s site for schools.

They found that using Google Earth was the only way to get it all on one map. For the project to really work and become useful with more information added, a new map interface would be needed to allow users to select what information they wanted displayed, says team member David Bartlett. The team’s presentation can be viewed here.

Business…

The ‘Business Light’ by Mark Thomas, Francis Irving, Aidan McGuire, Ben Schofield, Alistair Houghton, Laurence Rowe and Tom Mortimer-Jones was a dashboard for watching business activity in Merseyside – allowing users to make informed business decisions through a traffic light ranking system.  They protoyped it, checked what data they could get (employment levels, insolvancies, contracts etc.), and worked out what the website would do. It also involved visualisations and screen scraping.

Libraries…

In ‘Library Data: What’s the Story?’ (originally: ‘why aren’t libraries more like Amazon?’) Ben Webb, Anna Powell-Smith and Mandy Phillips followed up a story on closed data in libraries. UK libraries generally have proprietary catalogue systems without public APIs. As a result, libraries have to pay for access to their own data, and users can’t share records easily. They found some sample open RDF data from one library provider, and built a prototype for an open UK-wide catalogue search. Find the presentation at this link.

Sport…

Jamie Bowman, Francine Higham, John McKerrell, Neil Macdonald and Francis Fish tackled the Other World Cup.

This was the World Cup’s alternative story. A visualisation showed stats that the media weren’t focusing on: the number of people displaced; and the chance of England winning, for example.

Meanwhile, Adrian McEwen’s lovely #hhhliv the @bubblino machine releases bubbles as we tweet on TwitpicBubblino machine tweeted bubbles everytime the hashtag #hhhliv was uttered on Twitter.

The winners of the day, as judged by Jane Clare, executive editor of Trinity Mirror’s Merseyside Weeklies, lawyer Steve Kuncewicz, and Lindsay Sharples, director of LJMU Open Labs:

  • First: The Business Light
  • Second: Why aren’t libraries more like Amazon?

We’d like to say a big thank you to our sponsors for hosting, feeding and rewarding our hard working participants; and congratulations to all involved in the day. Thank you to all the hacks and hackers who supplied information for this blog post.

What they said…

“I’ve just had one of the best working days you could wish for…” Alison Gow, executive editor, digital, Liverpool Post & Echo.

“I’m still fascinated by #scraperwiki and #hhhliv. I should investigate more,” @defnetmedia on Twitter.

“Great day at #hhhliv trying to visually represent costs of #Worldcup. Trying to to take this further as lots more info emerges in future months,” @fransa on Twitter.

“Good day #hhhliv. Learned a lot from some very smart people,”  @ed_walker86 on Twitter.

“What impressed me most about the event was the total commitment of all of those present to be involved in the process and deliver a fresh idea,” John O’Shea, artist.

Blog coverage

*Locations may be added or removed, depending on interest. If you would like to talk to us about getting involved in these events, as a partner or sponsor, please contact judith [at] scraperwiki.com.

]]>
https://blog.scraperwiki.com/2010/07/hacks-and-hackers-hack-day-liverpool-policemen-judges-and-libraries/feed/ 11 758213713