Hiding invisible text in Table Xtract

As part of the my London Underground visualisation project I wanted to get data out of a table on Wikipedia, you can see it below. It contains data on every London Underground station including things like the name of the station, the opening date, which zone it is in, how many passengers travel through it […]

Scraping the protests with Goldsmiths

Zarino here, writing from carriage A of the 10:07 London-to-Liverpool (the wonders of the Internet!). While our new First Engineer, drj, has been getting to grips with lots of the under-the-hood changes which’ll make ScraperWiki a lot faster and more stable in the very near future, I’ve been deploying ScraperWiki out on the frontline, with […]

How to scrape and parse Wikipedia

Today’s exercise is to create a list of the longest and deepest caves in the UK from Wikipedia. Wikipedia pages for geographical structures often contain Infoboxes (that panel on the right hand side of the page). The first job was for me to design an Template:Infobox_ukcave which was fit for purpose. Why ukcave? Well, if […]

ScraperWiki

Extract tables from PDFs and scrape the web

Tag Archives | wikipedia

Hiding invisible text in Table Xtract

Scraping the protests with Goldsmiths

How to scrape and parse Wikipedia