Claire Miller – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 Meet the User – Claire Miller https://blog.scraperwiki.com/2011/07/meet-the-user-claire-miller/ https://blog.scraperwiki.com/2011/07/meet-the-user-claire-miller/#comments Fri, 29 Jul 2011 14:44:22 +0000 http://blog.scraperwiki.com/?p=758215202

Here at ScraperWiki we like programmers and journalists. We’re also interested in helping to bridge the gap, so with no further ado let me present to you: Claire Miller. She’s a reporter and data journalist for Media Wales.

She’s come to ScraperWiki from the non-coding side but with a journalistic eye for the story. She’s new to the ScraperWiki way of doing things but says “I’m just starting to appreciate how handy a tool ScraperWiki can be for journalists. At the moment when I come across a dataset on a government website, like the Food Standards Agency ratings or jobcentre vacancies, I’ve taken to checking ScraperWiki to see if someone is trying to scrape them – this is how I ended up with a collection of scrapers gathering up ratings of dodgy takeaways in Wales.”

She’s hoping there will be some stories for the paper but has gone over to the ScraperWiki side and wants to make the data useful for people living in Wales. Like me and many others in the data journalism fold she’s decided to just jump in by forking useful looking scrapers and pointing them at Welsh datasets.

I’m interested in how I can use ScraperWiki to find data I can use to find stories. I’ve already used some from a series of scrapers gathering data from the jobcentre vacancies search to analyse the sort of vacancies that are offer for people in Wales. I’m also working on gathering up lots of data on Welsh schools and found the Welsh School Finder which saved me so much time in linking census and financial data to addresses and locations.

Her ultimate goal is to start scraping PDFs as FOI requests are constantly being given in that most evil of formats. We’re working on our documentation and tutorials and PDFs are most definitely on our list. For the budding data journalists out there, I’d say walk before you can run. PDFs are hard. So start with html web scraping and CSVs. But remember, where there’s a ScraperWiki digger, there’s a way!

]]>
https://blog.scraperwiki.com/2011/07/meet-the-user-claire-miller/feed/ 1 758215202