Microfinance Data Scraping

I went to the Datakind‘s New York Datadive last November and met the Microfinance Information Exchange (MIX), a group that ‘delivers data services, analysis, research and business information on the institutions that provide financial services to the world’s poor’. They wanted to see whether web-scraping could save them from manually gathering data. So fellow divers and I showed MIX the utility […]

Handling exceptions in scrapers

When requesting and parsing data from a source with unknown properties and random behavior (in other words, scraping), I expect all kinds of bizarrities to occur. Managing exceptions is particularly helpful in such cases. Here is some ways that an exception might be raised. [][0] #The list has no zeroth element, so this raises an […]

More Python libraries!

I installed some new Python libraries and restructured the Python libraries documentation page. Some highlights Gensim is “Topic Modelling for Humans”. Read the introduction to the documentation. I’m looking for an excuse to play with it. unidecode transliterates Unicode into ASCII. It’s helpful for things like making column names. Beautiful Soup 4 beta (It’s a […]

What happened in New York

At our New York datacamp, we set out to liberate data, teach people to liberate data, and find stories in data. About 100 people showed up for the event, and about 40 of them attended the Learn to Scrape sessions. The hacking was punctuated by talks by Tom Lee of the Sunlight Foundation and Jake […]

ScraperWiki

Extract tables from PDFs and scrape the web

Archive by Author