A short tutorial on starcluster and ipython parallel

Pierre-Luc Bacon - PhD Student, McGill SOCS

April 15, 2014, 12:10 p.m. - April 15, 2014, 12:40 p.m.


I recently faced the challenge of computing some memory-hungry statistics on our modest lab machines. With a conference deadline approaching, I had to quickly find a solution that would not involve buying new hardware. Due to the administrative delays, applying to CLUMEQ was also not an option. I first experimented with a duck-tape homemade cluster solution that allowed me to pool the computing power of all our lab machines simultaneously. While fully functional, this solution was limited by the lack of administrative privileges on the SOCS machines. And then came Starcluster. I will tell you the story of how I used Amazon EC2 to fit 750Gb of data in RAM while sipping a Cappuccino from a fancy coffee place. Not only did I get my results back in less than 30 mins, but the overall experiment also only cost $5.

I will give a brief tutorial on how to set up Starcluster ( http://star.mit.edu/cluster/) with with IPython parallel. I will also show you how to reduce the cost of your experiment even more by bidding on cheap "spot instances". Roboticists, biologists, machine learnists, and all other "ists" dealing with abundant data are welcome.