mozfest – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 Some ScraperWikiLovin’ at MozFest https://blog.scraperwiki.com/2011/11/some-scraperwikilovin-at-mozfest/ https://blog.scraperwiki.com/2011/11/some-scraperwikilovin-at-mozfest/#comments Mon, 07 Nov 2011 14:50:56 +0000 http://blog.scraperwiki.com/?p=758215802 This weekend saw ideas made reality, collaborations fostered and the future web bloom. The Mozilla Festival was all about making the web and making it happen in two days! Here at ScraperWiki we like doing that with data, so as well as contributing to the Data Driven Journalism Handbook, we held a quick fire ScraperWiki round.

And when I say quick I mean ~1hr! With a couple of geeks in hand, some eager journalist types, laptops and our ever articulate CEO, Francis Irving, we set to work, well, talking about data. The fact is there are many pre-scraping steps to consider:

  1. What is the general area you are interested in?
  2. Can you find other people, especially geeks, with that interest?
  3. When you have done so, you need to find where the data is that relates to your field of interest
  4. Once you’ve got a list of interesting data, you need to look at its structure (non-programmatically) in order to decide on a hypothesis to test
  5. Then you need to recruit your geek (who should be involved in all of the above steps) to start deconstructing the data i.e. seeing what can be scraped
  6. At this point you all need to work together to decide the schema of the scraper datastore i.e. the headings and their attributes
  7. Iterate until your data can answer your hypothesis or alter your hypothesis (it could be that you can mash the scraper with another dataset)
  8. Get working on answering your hypothesis. The outcome could be a query, a visualization or an application
  9. Go back to your data and iterate again so that the structure fits your outcome
  10. Pat yourselves on the back, have a beer and keep in touch for your next project

This may seem a bit much but this is how you make, iterate, and mediate for the web. The Mozilla Festival proved that this is achievable and enjoyable. In that vein, we got a scraper in 1hr! So a big cheer to Alex Poderoso for winning the coveted ScraperWiki mug.

To catch up on the MozFest fun, here is  the first draft of the Data Journalism Handbook. The festival premiered an amazing HTML5 documentary called The One Millionth Tower. You can catch up with all the rest including teaching kids to code with Hackasaurus and hacking video with popcorn.js (and an octocopter!) and loads more at the Mozilla Festival website.

]]>
https://blog.scraperwiki.com/2011/11/some-scraperwikilovin-at-mozfest/feed/ 1 758215802
Diggers and Dinosaurs – Scraping at the Mozilla Festival https://blog.scraperwiki.com/2011/10/diggers-and-dinosaurs-scraping-at-the-mozilla-festival/ https://blog.scraperwiki.com/2011/10/diggers-and-dinosaurs-scraping-at-the-mozilla-festival/#comments Mon, 17 Oct 2011 15:40:33 +0000 http://blog.scraperwiki.com/?p=758215654 In a complete paradigm shift of the epic battle between Godzilla and Mothra we are turning our backs on the old claymation medium and embracing the digital age where dinosaurs and diggers (yes, I am aware we are a machine and not a moth) can roam free across the lawless plains of web 2.0.

Both can be found at the Mozilla Festival park in London on 4-6 November. If you’re lucky you might even spot a wily firefox. There will be an inclosure on the Friday from 18:00 where our tamed digger driver, Francis Irving, can give you some driving lessons.

As part of the Data Journalism Workshop on the Saturday, 10:00-17:00, we’ll be hosting a ‘Scraping 101’ session. There will be a host of data trackers to guide you through the web wilderness including Open Knowledge Foundation‘s Jonathan Gray and the European Journalism Centre‘s Liliana Bounegru. There will be herds of other data/web beasts roaming the plains so we suggest you stay inside or close to your digger.

If you’re interested in a close encounter of the data kind sign up for the event here.

So watch out Mozilla Festival – you’re being ScraperWikied!

]]>
https://blog.scraperwiki.com/2011/10/diggers-and-dinosaurs-scraping-at-the-mozilla-festival/feed/ 1 758215654