So you think you know Text to Video Diffusion models?

So you think you know Text to Video Diffusion models?

4.203 Lượt nghe
So you think you know Text to Video Diffusion models?
Video Diffusion Generative AI is the next frontier for AI. In this video we discuss the problem, the challenges, the solutions, and the seminal papers in the field like Google's Imagen, Meta's Make-a-video, Nvidia's Video Latent Diffusion Model (LDM), and OpenAI's SORA. On the way, we discuss the core concepts of Image Diffusion models, like Forward and Reverse Diffusion, UNet, convolution, and diffusion transformers. This video is meant to be a quick overview of all the major concepts in the field - hope you guys and gals found it useful for deeper dives. Buy me a coffee at https://ko-fi.com/neuralavb ! Support us on Patreon to access slides and video material! patreon.com/NeuralBreakdownwithAVB Related videos: What are Conditional Image Diffusion Models? https://youtu.be/w8YQcEd77_o What is Latent Space? https://youtu.be/FslFZx08beM How do LLMs generate images? (The answer is not diffusion) https://youtu.be/EzDsrEvdgNQ Transformers and Attention Playlist https://www.youtube.com/playlist?list=PLGXWtN1HUjPfK_n9j5tPZ_a6Rx3yceZ_B Visit our Patreon for full access to code and other documents/animations: https://www.patreon.com/NeuralBreakdownwithAVB #generativeai #deeplearning #ai Useful papers: Video Diffusion Models: https://arxiv.org/abs/2204.03458 Imagen: https://imagen.research.google/video/ Make A Video: https://makeavideo.studio/ Video LDM: https://research.nvidia.com/labs/toronto-ai/VideoLDM/index.html CogVideoX: https://arxiv.org/abs/2408.06072 OpenAI SORA article: https://openai.com/index/sora/ Useful article: https://lilianweng.github.io/posts/2024-04-12-diffusion-video/ Survey Papers: https://arxiv.org/abs/2310.10647 and https://arxiv.org/abs/2405.03150 Timestamps: 0:00 - Intro 0:39 - Text to Image Conditional Diffusion Models 2:16 - Challenges with Video Diffusion Models 3:43 - VDM (2022) 4:50 - Factorized 3D Unet models 5:46 - Meta Make A Video 7:28 - Google Imagen Video 8:07 - Nvidia Video LDM 9:36 - OpenAI SORA