hackathon – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 World Cup Hack Day, London 10th June – a teaser! https://blog.scraperwiki.com/2014/06/world-cup-hack-day-london-10th-june-a-teaser/ https://blog.scraperwiki.com/2014/06/world-cup-hack-day-london-10th-june-a-teaser/#comments Wed, 04 Jun 2014 07:30:24 +0000 https://blog.scraperwiki.com/?p=758221830 With the England team just arrived in Miami for their final preparations for the World Cup, Mohammed Bin Hammam is back in the news for further accusations of corruption.

This is interesting because we saw Hammam’s name on Friday as we were testing out the NewsReader technology in preparation for our Hack Day in London on Tuesday 10th June. NewsReader is an EU project which aims to improve our tools to analyse the news.

And that’s just what it does.

Somewhat shame-faced we must admit that we are somewhat ignorant of the comings and goings of football. However, this ignorance illustrates the power of the NewsReader nicely. We used our simplifed API to the Newsreader technology to search thousands of documents relating to the World Cup. In particular we looked for Sepp Blatter and David Beckham in the news, and who else was likely to appear in events with them. The result of this search can be seen in the chart below. Which shows that Mohammed Bin Hammam appears very frequently in events with Sepp Blatter. Actors

For us soccer ignoramuses, the simple API also provides a brief biography of bin Hammam from wikipedia. Part of the NewsReader technology is to link mentions of names to known individuals and thus wider data about them. We can make a timeline of events involving bin Hammam, which we show below.

Timeline

It’s easy to look at the underlying articles behind these events, and discover that bin Hammam’s previous appearances in the news have related to bribery.

Finally, we used Gephi to generate a pretty, but somewhat cryptic visualisation. beckham_and_blatter The circles represent people we found in the news articles who appeared in events with either Sepp Blatter or David Beckham, they are the purple dots from which many lines emanate. The purple circles represent people who have had interactions with one or other of Blatter or Beckham, the green circles that have had interactions with both. The size of the circle represents the number of interactions. Bin Hammam appears as the biggest green circle.

You can see an interactive version of the first two visualisations here, and the third one is here.

That’s a little demonstration of what can be done with the Newsreader technology, just imagine what someone with a bit more footballing knowledge could do!

If you want to join the fun, then we are running a Hack Day at the Westminister Hub in central London on Tuesday 10th June where you will be able to try out the NewsReader technology for yourself.

It’s free and you can sign up here, on EventBrite: EventBrite Sign Up

]]>
https://blog.scraperwiki.com/2014/06/world-cup-hack-day-london-10th-june-a-teaser/feed/ 1 758221830
Code for America getting all ScraperWikied over Government Data https://blog.scraperwiki.com/2011/05/code-for-america-getting-all-scraperwikied-over-government-data/ https://blog.scraperwiki.com/2011/05/code-for-america-getting-all-scraperwikied-over-government-data/#comments Mon, 23 May 2011 16:37:07 +0000 http://blog.scraperwiki.com/?p=758214881 Hackathons have been sprouting up all over the world. In fact, there was an Open Data Hackathon in Guatemala City just last week! But the loudest buzz coming for the data hack hive can be heard in the US where Code for America has been producing the sweetest honey.

So here at ScraperWiki headquarters we are flattered to be their tool of choice. They’ve been using ScraperWiki as a project or session track for just about every hackathon, datacamp, or labs Friday. On the Stanford Hackathon blog, they write:

“ScraperWiki-ing” is a great activity for beginner coders. It is social and forgiving. Even better, your work goes towards making our governments more efficient, open, and transparent by making information more linkable and extendable.”

The closure of Data.gov has not dampened their spirits and with ScraperWiki, we think they’ll do an even better job of making data work for the government rather than the government working for the data.

Advantage of using ScraperWiki according to Tyler Stalder, fellow at Code for America:

  • No technology setup or install-fest required to start working
  • Small project with a clear deliverable product that can be completed in a couple of hours
  • Directly focused on civic data
  • It’s easy to pair on scrapers
  • Python, Ruby, PHP support so it’s easy for us to organize pairs from a  diverse pool of devs

]]>
https://blog.scraperwiki.com/2011/05/code-for-america-getting-all-scraperwikied-over-government-data/feed/ 3 758214881