What is Attention in Language Models?
This video is part of LLM University
https://docs.cohere.com/docs/the-attention-mechanism
A huge roadblock for language models is when a word can be used in two different contexts. When this problem is encountered, the model needs to use the context of the sentence in order to decipher which meaning of the word to use. This is precisely what self-attention models do.
Bio:
Luis Serrano is the lead of developer relations at Co:here. Previously he has been a research scientist and an educator in machine learning and quantum computing. Luis did his PhD in mathematics at the University of Michigan, before embarking to Silicon Valley to work at several companies like Google and Apple. Luis is the author of the Amazon best-seller "Grokking Machine Learning", where he explains machine learning in a clear and concise way, and he is the creator of the educational YouTube channel "Serrano.Academy", with over 100K subscribers and 5M views.
===
Resources:
Blog post: https://txt.cohere.ai/what-is-attention-in-language-models/
Learn more: https://www.youtube.com/c/LuisSerrano