Comments on: Book review: Data Science at the Command Line by Jeroen Janssens Extract tables from PDFs and scrape the web Thu, 14 Jul 2016 16:12:42 +0000 hourly 1 By: Jeroen Janssens Wed, 11 Feb 2015 15:07:22 +0000 Thanks Ian, that’s very kind of you. I’ll look into that section and see whether I can improve the wording.



By: Ian Hopkinson Wed, 11 Feb 2015 13:24:30 +0000 By the way – people should definitely buy your book 😉

By: Ian Hopkinson Wed, 11 Feb 2015 13:24:02 +0000 Hi Jeroen,
thanks for taking the time to comment!

I drew the implication that one should use the command line in preference to a more conventional programming from p161 in the “Be Creative” section.

I agree entirely with using a range of tools, according to the task at hand. I’m intending making more use of the command line than I currently make. I think there will always be a discussion as to what the best tool for a particular task is, and even if a “best” tool exists in the sense that “best” is a qualitative judgement. For example, I might prefer to do something in Python rather than shell because I can write it in a way that might be longer but is more descriptive of what its doing. Someone who has a better memory for command line options would come to a different judgement.

best regards


By: Jeroen Janssens Wed, 11 Feb 2015 12:57:45 +0000 Thank you very much for writing this review, Ian. It’s good to hear that the command line is part of your day-to-day activities at SraperWiki.

I was a bit surprised to read “It finishes by proposing the command line as a replacement for a conventional programming language with which I can’t agree.” If you can explain where you think I’m proposing this then I shall correct it immediately! In the meantime, please allow me to use this space to provide some context.

If there’s one thing I want readers to take away from the book, it’s that a data scientist should use whatever approach gets the job (or part of the job) done. That could mean R to do some regression, D3 to create an interactive visualization, Go to scrape a wiki, and yes, sometimes the command line. It’s a valuable skill to be able to chop your problem into subproblems, identify when you can best use which approach, and stitch everything together. On the one hand, it would be silly to think that the command line is best approach for everything. On the other hand, while a programming language can sometimes give you much more speed, power, and flexibility, that doesn’t mean you should use that programming language for everything. For example, it’s perfectly fine to start with the command line to obtain and scrub some data and then continue with IPython Notebook in combination with pandas and seaborn to explore it. Mix and match approaches, be creative, and be practical!

Let me end this babble by saying that I still believe that being able to leverage the power of the command line, and integrate it with your data science workflow, will make you a more efficient and productive data scientist. My suggestion: start with cowsay ( and take it from there.

Thanks again.