Spreadsheets are code: EuSpRIG conference.

I’m back from presenting a talk on PDFTables. So whilst I’ve been thinking of spreadsheets as sources of raw data, it’s easy to forget how everyone else uses spreadsheets. The conference reminded me particularly of one simple fact about spreadsheets that often gets ignored:

Spreadsheets are code.

And spreadsheets are a way of writing code which hasn’t substantially changed since the days of Visicalc in 1978 (the same year as the book which defined the C programming language came out).

Programming languages have changed enormously in this time, promoting higher-level concepts like object orientation, whilst the core of the spreadsheet has remained the same. Certainly, there’s a surprising number of new features in Excel, but few of these help with the core tasks of programming within the spreadsheet.

Structure and style are important: it’s easy to write code which is a nightmare to read. Paul Mireault spoke of his methodology for reducing the complexity of spreadsheets by adhering to a strict set of rules involving copious use of array formulae and named ranges. It also involves working out your model before you start work in Excel, which culminates in a table of parameters, intermediate equations, and outputs.

And at this point I’m silently screaming: STOP! You’re done! You’ve got code!

Sure, there’s the small task of identifying which of these formulae are global, and which are regional and adding appropriate markup; but at this stage the hard work is done; converting that into your language of choice (including Excel) should be straightforward. Excel makes this process overly complicated, but at least Paul’s approach gives clear instructions on how best to handle this conversion (although his use of named ranges is as contentious as your choice of football team or, for programmers, editor.)

Tom Grossman’s talk on reusable spreadsheet code was a cry for help: is there a way of being able to reuse components in a reliable way? But Excel hampers us at every turn.

We can copy and paste cells, but there is so much magic involved. We’re implicitly writing formulae of the form “the cell three to the left” — but we never explicitly say that: instead we read a reference to G3 in cell J3. And we can’t easily replace these implicit references if we’re copy-pasting formulae snippets; we need to be pasting into exactly the right cell in the spreadsheet.

In most programming languages, we know exactly what we’ll get when we copy-and-paste within our source code: a character-by-character identical copy. But copy-and-paste programming is considered a bad ‘smell’: we should be writing reusable functions: but without stepping into the realm of macros each individual invocation of what would be a function needs to be a separate set of cells. There are possibilities of making this work with custom macro functions or plugins – but so many people can’t use spreadsheets containing macros or won’t have installed those plugins. It’s a feature missing from the very core of Excel which makes it so much more difficult and longwinded to work in it.

Not having these abstractions leads to errors. Ray Panko spoke of the errors we never see; how base error rates of a few percent are endemic across all fields of human endeavour. These error rates are at the time of writing the code the first time, and per instruction. We can hope to reduce these error rates through testing, peer review and pairing. Excel hinders testing and promotes copy-paste repetition, increasing the number of operations and the potential for errors. Improving code reuse would also help enormously: the easiest code to test is the code that isn’t there.

A big chunk of the problem is that people think about Excel the same wrong way they think about Word. In Word, it’s not a major problem, so long as you don’t need to edit the document: so long as it looks correct, that might be good enough, even if the slightest change breaks the formatting. That’s simply not true of spreadsheets where a number can look right but be entirely wrong.

Maria Csernoch’s presentation of Sprego – Spreadsheet Lego – described an approach for teaching programming through spreadsheets which is designed to get people thinking about solving the problems they face methodically, from the inside out, rather than repeatedly trying ‘Trial-and-Error Wizard-based’ approach with minimal understanding.

It’s interesting to note the widespread use of array formulae across a number of the talks – if you’re making spreadsheets and you don’t know about them, it might be worth spending a while learning about them!

In short, Excel is broken. And I strongly suspect it can’t be fixed. Yet it’s ubiquitous and business critical. We need to reinvent the wheel and change all four whilst the car is driving down the motorway — and I don’t know how to do that…

Tags: spreadsheets

3 Responses to “Spreadsheets are code: EuSpRIG conference.”

Sam Thursfield July 16, 2015 at 2:48 pm #

You’re absolutely right about spreadsheets, but I’m not so convinced that there’s actually anything to “fix” here. I fully understand how to code and yet, I still can’t actually get computers to do what I want very easily. And I still find myself going to conferences where people talk about how things are difficult with current tools and some solution is needed to make things better … 🙂
- David McKee July 17, 2015 at 11:15 am #
  
  My main argument (although perhaps “rant” would be more accurate!) is that spreadsheets are barely running with ‘current’ tools. Things we just take for granted — copy-pasting code from StackOverflow, being able to see your code trivially, local variables which have names that don’t magically change — these don’t exist in the Excel world! They need dragging kicking and screaming into the early 90’s. 😛
  
  And we should *never* be happy with the state of current tools. Out of irritation and discontent grow the pearls of “well of *course* you do it this way”!
JohnOS August 11, 2015 at 12:47 pm #

Totally agree that spreadsheets are code, especially the pricing and risk spreadsheets I’ve often worked with on trading floors. My own view is that a big part of the problem is that spreadsheet “code” is trapped in a desktop user interface that does little data validation, prevents significant automation, and exposes the “code” to all users. Imagine if every business user of a Java application was also a Java developer with access to the source code! That’s how it is with Excel. I believe we can take a big step forward if we take spreadsheets off the desktop and run them on a server. That’s what spreadserve.com does. In trader speak I’m “talking my own book”, of course!

ScraperWiki

Extract tables from PDFs and scrape the web

Blog

Spreadsheets are code: EuSpRIG conference.

3 Responses to “Spreadsheets are code: EuSpRIG conference.”