A June 2016 roundup of distributed Deep Learning projects on Apache Spark

1 minute read


Here’s a quick roundup of distributed deep learning efforts running on Apache Spark. This will only list active(-ish) projects rather than academic experiments (of which there are too many to list) There’s roughly two approaches:

Linking Spark with an existing framework

Implementing a full-fledged frameworrk

  • DeepDist (repo) is a framework for DBNs implementing downpour gradient descent. The approach is reminiscent of Splash
  • DeepLearning4J is reimplementing a wide range of NNs, from a fast Java array lib. They run distributed on Spark, with GPU acceleration.

This is just a quick preview, and the criteria for notability are somewhat arbitrary : e.g. I chose not to include OpenDL, because it’s a seemingly unmaintained experiment based on Jeff Dean’s “Large Scale Distributed Deep Networks” paper. Feel free to mention anything I would have forgotten in comments !