Pall Hilmarsson – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 Meet the User – Pall Hilmarsson https://blog.scraperwiki.com/2011/07/meet-the-user-pall-hilmarsson/ https://blog.scraperwiki.com/2011/07/meet-the-user-pall-hilmarsson/#comments Fri, 08 Jul 2011 14:10:43 +0000 http://blog.scraperwiki.com/?p=758215110 Our digger has been driving around colder climes by one of our star users. Icelander Pall Hilmarrson. Driving such a heavy vehicle on icy surfaces and through volcanic ash may seem daunting to most people, but Pall has not only ventured forth undeterred, he has given passers-by a lift. One such hitch-hiker is Chris Taggart with his OpenCorporates project. I caught up (electronically) with Pall.

What’s your background and what are your particular interest when it comes to collecting data?

I have a work related experience in design – I started working as a designer 12 years ago, almost by accident. At one point I thought I’d study it and I did try, for the whole of ten days! Fortunately I quit and went for a B.A. degree in anthropology. Somehow I´ve ended up again doing design again. Currently I work for the Reykjavík Grapevine magazine.

I´m particularly interested in freeing data that has some social relevance, something that gives us a new way of seeing and understanding society. That comes from the anthropology. Data that has social meaning.

How have you found using ScraperWiki and what do you find it useful for?

ScraperWiki has been a fantastical tool for me. I had written scrapers before, mostly small scripts to make RSS feeds and only in Perl. ScraperWiki has led me to teach myself Python and write more complex scrapers. It has opened up a whole new set of possibilites. I really like being able to study other peoples scrapers and helping others with their scrapers. I’ve learned so much from ScraperWiki.

Are there any data projects you’re working on at the moment?

Right now I´m involved in scraping some national company registers for the brilliant OpenCorporates site. I´m also compiling a rather large dataset on foreclosures in Iceland the last 10 years – trying to get an image of where the financial meltdown is hitting the hardest. I´m hoping to make it into an interactive map application. So far the data shows some interesting things – going into the project I had some notion that the Reykjavík suburbs with their new apartment buildings would be the bulk of foreclosures. It seems though that the old downtown area is actually where most apartments are going up for auction.

How is the data landscape in the area you’re interested in? Is it accessible, formatted, consistent?

Governmental data over here is not easily accessible, but that might change. A new bill introduced in Parliament aims to free a lot of data and make the right for citizens to access information a lot stronger. But of course it will never be enough. Data begets more data.

So watch out Iceland – you’re being ScraperWikied!

]]>
https://blog.scraperwiki.com/2011/07/meet-the-user-pall-hilmarsson/feed/ 1 758215110
There’s More Than One Way to Scrape a Site https://blog.scraperwiki.com/2011/05/theres-more-than-one-way-to-scrape-a-site/ Fri, 13 May 2011 15:26:44 +0000 http://blog.scraperwiki.com/?p=758214861

A request came in to ScraperWiki to scrape information on the Members of the European Parliament.  I put it out on Twitter and Facebook hoping a kind member of the ScraperWiki community will have spent so much time on the computer he/she has no life at all. I had to turn people away!

Within minutes, two tweeters wanted to give it a go and I got a reply on Facebook.  In fact, Tim Green had already scraped the names and URLs of MEPs by the time I got back to him saying it had already been claimed on twitter by Pall Hilmarsson.

Although both scrapers are looking at the same site, Tim‘s is less than 20 lines of code and with only 8 revisions, it’s a very quick scrape. Whereas Pall‘s went for the full schebang, scraping opinions and speeches and generally drilling down into the data a whole lot more. Hence the nearly 200 lines of code!

So if you’re a code junky, take a look and what it takes to scrape and then scrape further by comparing scrapers/meps with scrapers/meps_2.   Also, Tim kindly scraped the next request: National Historic Ships Register. To Tim and Pall I say: If the ScraperWiki digger were capable of emotion you would both be receiving a diesel greasy kiss!

European Parliament Members and National Historic Ships – you’ve been ScraperWikied! (with help from your friendly neighbourhood programmers)

]]>
758214861