Why the World Got Stuck on Spreadsheets and the Future of Data Manipulation
Guest post by Dan Thompson
In 1979 a Harvard MBA student and former programmer at DEC, invented something that fundamentally change the world of IT and which still affects everyone with a desk job today. What Dan Bricklin had created was the spreadsheet – in its modern form at least. There had been number crunching programming languages before, there had been systems for working on rows and columns of numbers before too. But what Bricklin did was make something interactive: the numbers updated as you were using it. It changed everything.
Thus the first modern spreadsheet application, VisiCalc, was born, and it was massively popular. This was the original “Killer App;” there are those who put the success of the Apple II down to VisiCalc alone. In the years that followed, Lotus 123 and then Microsoft Excel would take over, but they never fundamentally deviated from the way that VisiCalc worked. The grid concept, the cell references, the simple formula language: they’ve all been there since the beginning. Which is shocking when you think about it, because since that time we’ve seen the introduction of the mouse, the graphical user interface, the Mac, the Windows PC, the web, smartphones, tablet computers and touch interfaces.
So why did spreadsheets not move on? The answer is that they have, sort of. Excel now has named ranges, pivot tables, the PowerPivot, change tracking, multi-user collaboration and tools to help with formula debugging, but you’re probably not using them. Either because you didn’t know they were there, you find them too complicated, or because you just haven’t been forced to learn them yet. The reason for this comes down to Microsoft’s dominance of the business IT market, which is built largely on backward compatibility. Business will only buy the new version of Excel if they’re confident all their stuff will still work. So any new features that Excel gets must be optional and out of the way. Oh, and there’s zero chance of loosing market share so making it slick and highly usable is not a major concern. Compatibility: crucial. Ease of use: optional.
For a long time, businesses have tried to replace spreadsheets with easier-to-use but less flexible centralised databases. But no sooner would they replace one spreadsheet the sales team were using with a “proper” system, then the sales team would go and invent three new spreadsheets to help with something else. Which points to spreadsheets’ biggest strength: the flexibility they give to those on the front line. People don’t want “One big database to rule them all”, they want flexible tools for working with data.
Change is coming though, the Windows mono-culture is now near an end. The prevalence of Macs, iPads and Chromebooks is forcing software onto the web as the one technology which works everywhere. Fortunately, the web is now maturing and is powerful enough to support rich user interfaces. Meanwhile, the ecosystem of single purpose apps that Apple pioneered with it’s App Store is being emulated on the Mac, Chrome and Windows 8. There is now a way for people like me, who’ve had an idea for an application or tool, to get that onto people’s computers. Innovation and proper competition can once again resume.
I have no doubt that spreadsheets will always be with us, but they will be joined by many more streamlined tools, each serving different and more specific use cases. OpenRefine is a great tool for cleaning up data sets and fixing mismatching values. Tableau is a great way to create interactive visualisations that go beyond the graphs and charts that people are used to. Meanwhile QueryTree, the product I founded, makes the process of sorting, joining, grouping and generally exploring data easier than with a formula driven spreadsheet.
But all will not be plain sailing. Ecosystems depend on a shared set of formats or standards in order to work. Train companies need to agree on the width of the track, television makers and broadcasters need to agree on what “HD” actually means, and for people who make apps that work with data, we need that data to be in a format we can understand. For a while, XML was the answer to everything. These days, it’s JSON. Yet each new Web API that launches structures its data in a slightly different way. I predict a long, drawn-out and gruesome battle to own the platform that apps will share structured data in. Maybe SPARQL will win, maybe everything will end up built on top of Google Spreadsheets – I have no idea. But, if I were going to place a bet of any of these technologies, I’d pick the one that is simple, that doesn’t rely on any one vendor and which already works in every tool out there. Yes, you’ve guessed it, my prediction for the data format of the future is: the CSV file.
Dan is the Founder and Managing Director of D4 Software, the company behind the data analysis tool QueryTree. Dan started his career as a C, C# then Python programmer turned Development Manger. He lives in Worcestershire with his wife and two children.
I think that people will be more open to different ways of working with data in time. There seems to be a culture shift from “the spreadsheet is the foundation of all data” to “what is that, and why do I need to learn it?” That culture shift will strengthen as people learn different ways of working with data and find different kinds of data to work with.
That said, I have no visual handle whatsoever on what you’re getting at, so I’ll poke around your links a little. In my mind, data storage and data visualization are two very different things. Spreadsheets are more for storage than visualization. But then, that’s my inner statistical programmer talking.