Back to contents
PHP
Python
Ruby
Choose a language:
Comma-separated value files are a quite commonly used text form of spreadsheet. To process them often requires special cases, such as parsing dates or removing bad rows.
For more details, read the documentation for Ruby's builtin CSV module (which grew out of FasterCSV). This question on StackOverflow gives some idea of the kinds of thing in can do.
Reading rows
Download the CSV file first. (If there are quirks in the input file, you might at this point want to preprocess the data using, for example, the .gsub method)
data = ScraperWiki::scrape("http://s3-eu-west-1.amazonaws.com/ukhmgdata-cabinetoffice/Spend-data-2010-11-01/Spend-Transactions-with-descriptions-HMT-09-Sep-2010.csv")
Load it into the standard Ruby CSV reader. It needs to be an array of lines.
require 'fastercsv'
csv = CSV.new(data)
You can then loop through the rows as if they were an array.
for row in csv:
puts format("%s spent on %s", row[7], row[3])
end
The above example, is available
as a complete scraper.
Saving to the datastore
Conventionally the first line gives the names for the columns. You can get the standard reader to load in each row as a hash, where the keys are those names.
csv = CSV.new(data, :headers => true)
This makes it easy to save the data. By default everything comes out as strings. We convert the 'Amount' row to a number type, so that it can then be added and sorted.
for row in csv
if row['Transaction Number']
row['Amount'] = row['Amount'].to_f
ScraperWiki::save_sqlite(unique_keys=['Transaction Number', 'Expense Type', 'Expense Area'], row.to_hash)
end
end