Turns out Attention wasn't all we needed - How have modern Transformer architectures evolved?

5.752 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Turns out Attention wasn't all we needed - How have modern Transformer architectures evolved?

In this video, we discuss the evolution of the classic Neural Attention mechanism from early adoptions of Bahnadau Attention and more specifically Self-Attention and Causal Masked Attention introduced in the seminal "Attention is all you need" paper. This video discusses more advanced forms of the Multi Headed Attention such as Multi Query Attention and Grouped Query Attention. Along the way, we also talk about important innovations in the Transformers and Large Language Models (LLMs) architecture, such as KV Caching. The video contains visualizations and graphics to further explain these concepts.

Correction in the slide at 22:03 - MHA has high latency (runs slow) MQA has low latency (runs faster)

All the slides, animations and write-up in this video will soon be shared in our Patreon. Go have fun! :) 
Join the channel on Patreon to receive updates about the channel, and get access to bonus content used in all my videos. Here is the link:
https://www.patreon.com/NeuralBreakdownwithAVB

Videos you might like:
Attention to Transformers playlist: https://www.youtube.com/playlist?list=PLGXWtN1HUjPfq0MSqD5dX8V7Gx5ow4QYW
50 concepts to know NLP: https://youtu.be/uocYQH0cWTs
Guide to fine-tuning open source LLMs: https://youtu.be/bZcKYiwtw1I
Generative Language Modeling from scratch: https://youtu.be/s3OUzmUDdg8

#deeplearning #machinelearning 

Timestamps:
0:00 - Intro
1:15 - Language Modeling and Next Word Prediction
5:22 - Self Attention
10:40 - Causal Masked Attention
14:45 - Multi Headed Attention
16:03 - KV Cache
19:49 - Multi Query Attention
21:43 - Grouped Query Attention					

Turns out Attention wasn't all we needed - How have modern Transformer architectures evolved?

Nhạc Theo Chủ Đề

Liên kết website