Scraping the protests with Goldsmiths

Zarino here, writing from carriage A of the 10:07 London-to-Liverpool (the wonders of the Internet!). While our new First Engineer, drj, has been getting to grips with lots of the under-the-hood changes which’ll make ScraperWiki a lot faster and more stable in the very near future, I’ve been deploying ScraperWiki out on the frontline, with some brilliant Masters students at CAST, Goldsmiths.

I say brilliant because these guys and girls, with pretty much no scraping experience but bags of enthusiasm, managed (in just three hours) to pull together a seriously impressive map of Occupy protests around the world. Using data from no less than three individual wikipedia articles, they parsed, cleaned, collated and geolocated almost 600 protests worldwide, and then visualised them over time using a ScraperWiki view. Click here to take a look.

Okay, I helped a bit. But still, it really pushed home how perfect ScraperWiki is for diving into a sea of data, quickly pulling out what you need, and then using it to formulate bigger hypotheses, flag up possible stories, or gather constantly-fresh intelligence about an untapped field. There was this great moment when the penny suddenly dropped and these journalists, activists and sociologists realised what they’d been missing all this time.

Students like these, with tools like ScraperWiki behind them, are going to take the world by storm.

But the penny also dropped for me, when I saw how suited ScraperWiki is to a role in the classroom. The path to becoming a data science ninja is a long and steep one, and despite the amazing possibilites fresh, clean and accountable data holds for everybody from anthropologists to zoologists, getting that first foot on the ladder is a tricky task. ScraperWiki was never really built as a learning environment, but with so little else out there to guide learners, it fulfils the task surprisingly well. Students can watch their tutor editing and running a scraper in realtime, right alongside their own work, right inside their web browser. They can take their own copy, hack it, and then merge the data back into a classroom pool. They can use it for assignments, and when the built-in documentation doesn’t answer their questions, there’s a whole community of other developers on there, and a whole library of living, working examples of everything from cabinet office tweeting to global shark activity. They can try out a new language (maybe even their first language) without worrying about local installations, plugins or permissions. And then they can share what they make with their classmates, tutors, and the rest of the world.

Guys like these, with tools like ScraperWiki behind them, are going to take the world by storm. I can’t wait to see what they cook up.

Tags: API, google maps, occupy, protests, views, wikipedia

ScraperWiki

Extract tables from PDFs and scrape the web

Blog

Scraping the protests with Goldsmiths

Trackbacks/Pingbacks