🕸 Edge#185: Centralized vs. Decentralized Distributed Training Architectures
In this issue:
we overview Centralized vs. Decentralized Distributed Training Architectures;
we explain GPipe, an Architecture for Training Large Scale Neural Networks;
we explore TorchElastic, a Distributed Training Framework for PyTorch.
Enjoy the learning!
💡 ML Concept of the Day: Centralized vs. Decentralized Distributed Training Architectures
In Edge#183, we discussed data and model parallelism as a fundamental taxonomy to classify distributed training techniques. Both data and model parallelism rely on partitioning tasks across different nodes that need to coordinate updates with each other. This mechanic is the essence of other criteria to classify distributed training architectures: centralized vs. decentralized training.
The notion of centralization in distributed training has to do with the parameters of a deep neural network. In a centralized training architecture,