So you think you know Text to Video Diffusion models?

4.203 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

So you think you know Text to Video Diffusion models?

Video Diffusion Generative AI is the next frontier for AI. In this video we discuss the problem, the challenges, the solutions, and the seminal papers in the field like Google's Imagen, Meta's Make-a-video, Nvidia's Video Latent Diffusion Model (LDM), and OpenAI's SORA. On the way, we discuss the core concepts of Image Diffusion models, like Forward and Reverse Diffusion, UNet, convolution, and diffusion transformers. This video is meant to be a quick overview of all the major concepts in the field - hope you guys and gals found it useful for deeper dives.

Buy me a coffee at https://ko-fi.com/neuralavb !
Support us on Patreon to access slides and video material!
patreon.com/NeuralBreakdownwithAVB

Related videos:
What are Conditional Image Diffusion Models? 
https://youtu.be/w8YQcEd77_o

What is Latent Space?
https://youtu.be/FslFZx08beM

How do LLMs generate images? (The answer is not diffusion)
https://youtu.be/EzDsrEvdgNQ

Transformers and Attention Playlist
https://www.youtube.com/playlist?list=PLGXWtN1HUjPfK_n9j5tPZ_a6Rx3yceZ_B

Visit our Patreon for full access to code and other documents/animations:
https://www.patreon.com/NeuralBreakdownwithAVB

#generativeai #deeplearning #ai 

Useful papers:
Video Diffusion Models: https://arxiv.org/abs/2204.03458
Imagen: https://imagen.research.google/video/
Make A Video: https://makeavideo.studio/
Video LDM: https://research.nvidia.com/labs/toronto-ai/VideoLDM/index.html
CogVideoX: https://arxiv.org/abs/2408.06072
OpenAI SORA article: https://openai.com/index/sora/
Useful article: https://lilianweng.github.io/posts/2024-04-12-diffusion-video/
Survey Papers: https://arxiv.org/abs/2310.10647 and https://arxiv.org/abs/2405.03150


Timestamps:
0:00 - Intro
0:39 - Text to Image Conditional Diffusion Models
2:16 - Challenges with Video Diffusion Models
3:43 - VDM (2022)
4:50 - Factorized 3D Unet models
5:46 - Meta Make A Video 
7:28 - Google Imagen Video
8:07 - Nvidia Video LDM
9:36 - OpenAI SORA					

So you think you know Text to Video Diffusion models?

Nhạc Theo Chủ Đề

Liên kết website