🥢 Edge#187: The Different Types of Data Parallelism
In this issue:
we overview the different types of data parallelism;
we explain TF-Replicator, DeepMind’s framework for distributed ML training;
we explore FairScale, a PyTorch-based library for scaling the training of neural networks.
Enjoy the learning!
💡 ML Concept of the Day: The Different Types of Data Parallelism
Continuing with our series about distributed, parallel training, today we would like to explore the different types of data parallelism relevant to deep learning architectures. In Edge#183, we discussed the two fundamental paradigms of parallel training: model and data parallelism. In the case of data parallelism, we typically refer to techniques that partition the datasets into subsets distributed across different nodes that need to synchronize updates with each other. The specifics of how the communication between different nodes occurs represent the foundation of different data parallelism methods. There are two fundamental architectures that are used for data parallelism in deep learning models.