Hi! We've renamed ScraperWiki.
The product is now QuickCode and the company is The Sensible Code Company.

Blog

Spot and Normalize Inconsistent Measures

Here’s an example of why you have to be very careful when scraping,
and why your normal run-of-the-mill technology that makes assumptions
won’t cut it:

One of our super-users, Julian Todd, decided to scrape the Vehicle Certification Agency (VCA) website on new car fuel consumption and exhaust emissions figures. And he spotted this:

And another search resulted in this:

Yes, that’s a change from milligrams per km to grams per km, noted
only in the header.

In ScraperWiki we can normalize this in standard python code:

for key in data.keys():
if key[-6:] == " mg km":
    nkey = key[:-6]+" g km"
    v = data.pop(key)
    if v == None:
        data[nkey] = None
    else:
        data[nkey] = float(v)/1000

This is from the scraper:
http://scraperwiki.com/scrapers/vca-car-fuel-data/

Tags: , , , ,

Trackbacks/Pingbacks

  1. Tweets that mention Spot and Normalize Inconsistent Measures | Scraperwiki Data Blog -- Topsy.com - February 10, 2011

    […] This post was mentioned on Twitter by amcguire62 and Floor Terra, ScraperWiki. ScraperWiki said: What out for inconsistently measured #data. You can fix it with a #scraper http://wp.me/pZ2IH-Pjoat […]

We're hiring!