01. Distributed training parallelism methods. Data and Model parallelism

01. Distributed training parallelism methods. Data and Model parallelism

636 Lượt nghe
01. Distributed training parallelism methods. Data and Model parallelism
The content is also available as text: https://github.com/adensur/blog/blob/main/torch_distributed/01_parallelism_methods/Readme.md This video is an introduction to distributed training with data and model parallelism, where I attempt to come to parallelism methods from the ground principles, using some basic math. I introduce a toy model example for NLP task, go through math of matrix multiplication in its layers, and explain how parallelisation would work in that regard.