data sources – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 Exploring Stack Exchange Open Data https://blog.scraperwiki.com/2013/08/exploring-stack-exchange-open-data/ Wed, 14 Aug 2013 16:01:48 +0000 http://blog.scraperwiki.com/?p=758219247 Inspired by my long commute and the pretty dreadful EDM music blasting out in my gym, I’ve found myself on a bit of a podcast kick lately. Besides my usual NPR fare (If you’ve not yet listened to an episode of This American Life with Ira Glass, you’ve missed out), I’ve been checking out the Stack Exchange podcast; a fairly irreverent take on the popular Q & A website hosted by the founders. On the 51st episode, they announced the opening of their latest site which focuses on the exciting world of open data.

Perhaps the most common complaint I’ve heard since I’ve started surrounding myself with data scientists is that getting specific sets of data can be frustratingly hard. Often, you will find that what you can get by scraping from a website is more than sufficient. That said if you’re looking for something oddly specific like the nutritional information of all food products on the shelves of UK supermarkets, you can quickly find yourself hitting some serious brick walls.

That’s where Stack Exchange Open Data comes in. It follows the typical formula that Stack Overflow has adhered to since its inception. Good questions rise to the top whilst bad ones fade into irrelevance.

stackexchange

The aim of this site is to provide a handy venue for finding useful datasets to analyze or use in projects. Despite only opening quite recently, it has garnered a large userbase and people are asking interesting questions and getting helpful answers. These range from finding out information about German public transportation to global terrain data .

Will you be using Stack Exchange Open Data in one of your future projects? Has Stack Exchange Open Data helped out out find a particularly elusive dataset? Let me know in the comments below.

]]>
758219247