open source – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 It’s good to share… https://blog.scraperwiki.com/2013/09/its-good-to-share/ Wed, 04 Sep 2013 16:00:21 +0000 http://blog.scraperwiki.com/?p=758219036 Image by Jason Empey

Image by Jason Empey

As you may have gathered I’m on a journey, I’ve worked as a physicist, a data scientist for 20 years and now I’ve fallen amongst software engineers. There are obvious similarities in what we do, we write code to do stuff. I write code to analyse things and the software engineers write code to do things for other people.

But the practice of these two disciplines can be quite different. I’ve written about my introduction to practical testing, and I’ve alluded to pair programming. Pair programming is where two programmers work together, side by side on the same piece of code. Nominally the “Driver” sits at the keyboard and the “Navigator” thinks, researches and directs. In practice you talk about what you’re doing, and as a consequence hopefully you produce better code and at the very least you produce code with which two people are familiar.

It’s a strangely sociable activity which I’ve found very educational because sometimes the small practice of how you do things is as important as the big theoretical picture. I, for example, can now passibly use the VIM text editor. And as discussed before I’m now a fan of testing.

Pair programming is a facet of sharing code, I’ve shared my code in the past. My PhD thesis, published 20 years ago has the FORTRAN programs I used to analyse my data printed in the back. I’ve happily shared code with my colleagues in Unilever but this was a sham: I was pretty confident no one would read my code, they wouldn’t build on my code, the most they would do was run it.

Now things are different, I have now written open source code which my colleagues are using and potentially complete strangers could use it. You can see it here. I’ve sat down with people who are actually going to use and extend my code, in fact it is no longer mine.

This has a several effects, firstly there can be changes I don’t necessarily understand. Secondly, style and format are more important. To use an analogy, you could publish a newspaper as one long strip of paper with a single typeface, weight with articles presented in alphabetical order of the first word. But it would be difficult to read and navigate because we are used to the ideas of headlines, bylines, the convention of major news at the front and sports at the back.

And so it is with code, programming languages have coding conventions which aren’t part of the language but which are important when you share code with others. In Python the coding standard is called PEP8, it tells you how to name your functions and layout your code. Different languages have different conventions, writing code in the wrong convention is liking speaking with a foreign accent.

Thirdly, I feel more responsible; I was distressed to discover my colleague relying on a function that I considered to be obsolete, one I had written in the early stages of development but since left to languish. But I had left no evidence that this was the case. My fragmentary, orphaned comments are similarly unhelpful to another programmer.

But it’s good to share, “my” code has benefitted from other people’s insights, and I hope I’m writing better code in the knowledge that other people are using it.

]]>
758219036