Apache Spark is a system for doing data analysis which can be run on a single machine or across a cluster, it is pretty new technology – initial work was in 2009 and Apache adopted it in 2013. There’s a lot of buzz around it, and I have a problem for which it might be […]