RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

37.397 Lượt nghe
RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs
Unlike sinusoidal embeddings, RoPE are well behaved and more resilient to predictions exceeding the training sequence length. Modern LLMs have already steered away from sinusoidal embeddings for better alternatives like RoPE. Stay with me in the video and learn about what's wrong with sinusoidal embeddings, the intuition or RoPE and how RoPE works. Original Transformer paper: https://arxiv.org/pdf/1706.03762.pdf RoPE paper: https://arxiv.org/pdf/2104.09864.pdf Using interpolation for RoPE: https://arxiv.org/pdf/2306.15595.pdf 0:00 - Introduction 1:06 - Attention computation 1:51 - Token and positional similarity 2:52 - Vector view of query and key 4:52 - Sinusoidal embeddings 5:53 - Problem with sinusiodal embeddings 6:34 - Conversational view 8:50 - Rope embeddings 10:20 - Rope beyond 2D 12:36 - Changes to the equations 13:00 - Conclusion