Back to contents
PHP
Python
Ruby
Choose a language:
Microsoft Excel spreadsheet files can contain lots of data in cells, formulae, notes and so on.
For more details, read the xlrd documentation.
Opening the file
Download the Excel file, and load it into the xlrd ("Excel reader"). This example
is taken from an FOI request
for tram accidents.
import scraperwiki
import xlrd
xlbin = scraperwiki.scrape("http://www.whatdotheyknow.com/request/82804/response/208592/attach/2/ACCIDENTS%20TRAMS%20Laurderdale.xls")
book = xlrd.open_workbook(file_contents=xlbin)
Directly select the worksheet you want by number or name.
sheet = book.sheet_by_index(0)
sheet = book.sheet_by_name('LAUDERDALE AVE')
If you're not sure what sheets there are, this prints them out.
for n, s in enumerate(book.sheets()):
print "Sheet %d is called %s and has %d columns and %d rows" % (n, s.name, s.ncols, s.nrows)
Reading values
Read the values of a row in one go.
print sheet.row_values(4)
Dates will come out as floating point numbers by default (e.g. 36649.0), unless you specially convert them. This cellval function does the hard work for you.
import datetime
def cellval(cell, datemode):
if cell.ctype == xlrd.XL_CELL_DATE:
datetuple = xlrd.xldate_as_tuple(cell.value, datemode)
if datetuple[3:] == (0, 0, 0):
return datetime.date(datetuple[0], datetuple[1], datetuple[2])
return datetime.date(datetuple[0], datetuple[1], datetuple[2], datetuple[3], datetuple[4], datetuple[5])
if cell.ctype == xlrd.XL_CELL_EMPTY: return None
if cell.ctype == xlrd.XL_CELL_BOOLEAN: return cell.value == 1
return cell.value
Read a whole row, with dates properly converted.
print [ cellval(c, book.datemode) for c in sheet.row(4) ]
Read individual cells like this.
print cellval(sheet.cell(0,0))
Saving to the datastore
Work out sensible heading names. In this case we remove the "." from "FLEET NO.".
keys = sheet.row_values(2)
keys[1] = keys[1].replace('.', '')
print keys
Make dictionaries from the rows and save them.
for rownumber in range(4, sheet.nrows):
# create dictionary of the row values
values = [ cellval(c, book.datemode) for c in sheet.row(rownumber) ]
data = dict(zip(keys, values))
data['rownumber'] = rownumber
# remove the empty column (which has a blank heading)
del data['']
# only save if it is a full row (rather than a blank line or a note)
if data['DATE'] != None and data['FLEET NO'] != None:
scraperwiki.sqlite.save(unique_keys=['rownumber'], data=data)
Bad dates
In some cases there are bad dates. Improve the cellval function to catch exceptions.
try:
datetuple = xlrd.xldate_as_tuple(cell.value, datemode)
except Exception, e:
print "BAD", cell, e
return str(cell)
try:
if datetuple[3:] == (0, 0, 0):
return datetime.date(datetuple[0], datetuple[1], datetuple[2])
return datetime.date(datetuple[0], datetuple[1], datetuple[2], datetuple[3], datetuple[4], datetuple[5])
except ValueError, e:
print "BAD value", datetuple, cell, e
return str(cell)