RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

37.397 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

Unlike sinusoidal embeddings, RoPE are well behaved and more resilient to predictions exceeding the training sequence length. Modern LLMs have already steered away from sinusoidal embeddings for better alternatives like RoPE. Stay with me in the video and learn about what's wrong with sinusoidal embeddings, the intuition or RoPE and how RoPE works.

Original Transformer paper: https://arxiv.org/pdf/1706.03762.pdf
RoPE paper: https://arxiv.org/pdf/2104.09864.pdf
Using interpolation for RoPE: https://arxiv.org/pdf/2306.15595.pdf

00 - Introduction
06 - Attention computation
51 - Token and positional similarity
52 - Vector view of query and key
52 - Sinusoidal embeddings
53 - Problem with sinusiodal embeddings
34 - Conversational view
50 - Rope embeddings
20 - Rope beyond 2D
36 - Changes to the equations
00 - Conclusion					

RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

Nhạc Theo Chủ Đề

Liên kết website