01. Distributed training parallelism methods. Data and Model parallelism
The content is also available as text: https://github.com/adensur/blog/blob/main/torch_distributed/01_parallelism_methods/Readme.md
This video is an introduction to distributed training with data and model parallelism, where I attempt to come to parallelism methods from the ground principles, using some basic math.
I introduce a toy model example for NLP task, go through math of matrix multiplication in its layers, and explain how parallelisation would work in that regard.