LLaDA - Large Language Diffusion Models (paper explained)
Diffusion Models are catching up big time for language tasks. In particular, I came across this interesting paper called, "Large language Diffusion Models or LLaDA in short).
While traditionally LLMs have been tacked in an auto-regressive way, Diffusion models flip them around the head and tackle them all-in-one-go style.
So, given their computational speed, are Diffusion the future of LLMs?
⌚️ ⌚️ ⌚️ TIMESTAMPS ⌚️ ⌚️ ⌚️
0:00 - Intro
1:23 - Motivation
1:51 - Autoregressive VS Diffusion
4:17 - Pre-training
4:52 - Supervised Fine-tuning
5:24 - Inference
6:51 - Experiments and Results
AI BITES KEY LINKS
Website: https://www.ai-bites.net
YouTube: https://www.youtube.com/@AIBites
Twitter: https://twitter.com/ai_bites
Patreon: https://www.patreon.com/ai_bites
Github: https://github.com/ai-bites