The Office for National Statistics (ONS) approached us regarding a task which involves transforming data in a spreadsheet. Basically, unpivotting it.
Data transformation is quite a general problem, but one with recurring patterns. Marginal variables are usually, well, somewhere in the margin. Cells generally refer to an observation or the name or value of a marginal variable. But there is enough variation that we cannot hope to capture all the possibilities in a GUI tool. Enter the formal language.
DataBaker, introduced by Dragon’s earlier article is essentially a formal language for describing particular ways of transforming data. DataBaker is essentially a dialect of Python, in that it is Python, but specialised for describing spatial relationships within spreadsheets (SQLAlchemy and numpy are more famous examples that can also be considered as dialects of Python).
It might seem unusual to invent a formal language for this task, but we have read our Nardi’s “A Small Matter of Programming” and are encouraged by quotes such as “ordinary people unproblematically learn and use formal languages and notations in everyday life”.
Earlier this week I interviewed Darren, lead for the ONS team that approached ScraperWiki. This team had essentially no previous programming experience, and are now successfully using DataBaker in their work. They are not professional programmers using a general purpose programming language, they are domain specialists using an end user programming language.
We chose Python because of its its clarity and its proven ability to be learned quickly by relative newcomers (for example, Python is a cornerstone in Software Carpentry’s bootcamp to help scientists learn to code). Darren’s team have no interest in learning Python per se, only in using DataBaker to do their job. It’s testimony to our success that they never have to think “I’m programming in Python”.
We are sneaking programming in by the backdoor, and this works because staff at ONS are already familiar with the domain of spreadsheets, and this makes it easier for them to understand the core concepts behind DataBaker. As Nardi says “people are likely to be better at learning and using computer languages that closely match their interests and their domain knowledge”.
Another part of the success of this project was that the ONS team had what Nardi refers to as a local developer. These are “domain experts who happen to have an intrinsic interest in computers and have more advanced knowledge of a particular program” (Nardi, again). Their local developer is the team’s go-to person for programming problems, and writes scripts, helps curate knowledge, and trains the team peer-to-peer.
A programming language provides the ultimate flexibility, but should only be used as a solution with care and whilst being attentive to the situation and expertise of the end users. The task that’s Darren’s team use DataBaker has no alternative solution: without DataBaker, the task wouldn’t be done. End User Programming for the win!
The Nardi quotes are from Bonnie Nardi’s most excellent and sadly little known book: “A Small Matter of Programming”.