Video Diffusion Generative AI is the next frontier for AI. In this video we discuss the problem, the challenges, the solutions, and the seminal papers in the field like Google's Imagen, Meta's Make-a-video, Nvidia's Video Latent Diffusion Model (LDM), and OpenAI's SORA. On the way, we discuss the core concepts of Image Diffusion models, like Forward and Reverse Diffusion, UNet, convolution, and diffusion transformers. This video is meant to be a quick overview of all the major concepts in the field - hope you guys and gals found it useful for deeper dives.
Buy me a coffee at https://ko-fi.com/neuralavb !
Support us on Patreon to access slides and video material!
patreon.com/NeuralBreakdownwithAVB
Related videos:
What are Conditional Image Diffusion Models?
https://youtu.be/w8YQcEd77_o
What is Latent Space?
https://youtu.be/FslFZx08beM
How do LLMs generate images? (The answer is not diffusion)
https://youtu.be/EzDsrEvdgNQ
Transformers and Attention Playlist
https://www.youtube.com/playlist?list=PLGXWtN1HUjPfK_n9j5tPZ_a6Rx3yceZ_B
Visit our Patreon for full access to code and other documents/animations:
https://www.patreon.com/NeuralBreakdownwithAVB
#generativeai #deeplearning #ai
Useful papers:
Video Diffusion Models: https://arxiv.org/abs/2204.03458
Imagen: https://imagen.research.google/video/
Make A Video: https://makeavideo.studio/
Video LDM: https://research.nvidia.com/labs/toronto-ai/VideoLDM/index.html
CogVideoX: https://arxiv.org/abs/2408.06072
OpenAI SORA article: https://openai.com/index/sora/
Useful article: https://lilianweng.github.io/posts/2024-04-12-diffusion-video/
Survey Papers: https://arxiv.org/abs/2310.10647 and https://arxiv.org/abs/2405.03150
Timestamps:
0:00 - Intro
0:39 - Text to Image Conditional Diffusion Models
2:16 - Challenges with Video Diffusion Models
3:43 - VDM (2022)
4:50 - Factorized 3D Unet models
5:46 - Meta Make A Video
7:28 - Google Imagen Video
8:07 - Nvidia Video LDM
9:36 - OpenAI SORA