Far too often I get so stuck into the work week that I forget to monitor the weather for the weekend when I should be going off to play on my dive kayaks — an activity which is somewhat weather dependent. Luckily, help is at hand in the form of the ScraperWiki email alert system. […]
The long dark tea time of the computer programmer
The way in which Information Technology is taught in England is so dull and harmful it should be scrapped – that’s the view of the Education Secretary Michael Gove. ‘A nation of digital illiterates’ (BBC) Many years ago there was a total corporate take-over of the computer software sector in the UK. Big money was […]
How to scrape and parse Wikipedia
Today’s exercise is to create a list of the longest and deepest caves in the UK from Wikipedia. Wikipedia pages for geographical structures often contain Infoboxes (that panel on the right hand side of the page). The first job was for me to design an Template:Infobox_ukcave which was fit for purpose. Why ukcave? Well, if […]
How to get along with an ASP webpage
Fingal County Council of Ireland recently published a number of sets of Open Data, in nice clean CSV, XML and KML formats. Unfortunately, the one set of Open Data that was difficult to obtain, was the list of sets of open data. That’s because the list was separated into four separate pages. The important thing […]
Tweeting the drilling
A very long time ago I discovered the easiest webscraping target: the locations of all the North Sea Oil wells. Once you webcrawl through the index pages, the entries were pretty straightforward. There were dates, water depths (in feet or metres), GPS locations and so on. The code, if you want to look at it, […]