Dynamic Token Merging: 2× Faster Byte-Level LLMs [Julie Kallini] - 724

Dynamic Token Merging: 2× Faster Byte-Level LLMs [Julie Kallini] - 724

604 Lượt nghe
Dynamic Token Merging: 2× Faster Byte-Level LLMs [Julie Kallini] - 724
Today, we're joined by Julie Kallini, PhD student at Stanford University to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression rates for under-resourced languages—and dig into byte-level modeling as an alternative. We discuss the architecture of MrT5, its ability to learn language-specific compression rates, its performance on multilingual benchmarks and character-level manipulation tasks, and its performance and efficiency. For the “Mission: Impossible Language Models” paper, we review the core idea behind the research, the definition and creation of impossible languages, the creation of impossible language training datasets, and explore the bias of language model architectures towards natural language. 🎧 / 🎥 Listen or watch the full episode on our page: https://twimlai.com/go/724. 🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confirmation=1 🗣️ CONNECT WITH US! =============================== Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/ Follow us on Twitter: https://twitter.com/twimlai Follow us on LinkedIn: https://www.linkedin.com/company/twimlai/ Join our Slack Community: https://twimlai.com/community/ Subscribe to our newsletter: https://twimlai.com/newsletter/ Want to get in touch? Send us a message: https://twimlai.com/contact/ 📖 CHAPTERS =============================== 00:00 - Introduction 4:28 - Issues of tokenization for LLMs 11:26 - Sub-word tokenization versus byte level tokenization 16:28 - Inefficiencies of byte T5 17:08 - Mr. T5 architecture 22:05 - Language-specific compression rate 24:10 - Benchmarks 27:15 - Inference efficiency 28:50 - Applying MrT5 to other decoder models 31:15 - Future directions of MrT5 33:51 - Mission: Impossible Language Models paper 39:59 - Languages tested 45:13 - Language architectures biased toward natural languages vs impossible languages 48:19 - Future directions for Mission Impossible 🔗 LINKS & RESOURCES =============================== Mission: Impossible Language Models - https://arxiv.org/abs/2401.06416 MrT5: Dynamic Token Merging for Efficient Byte-level Language Models - https://openreview.net/forum?id=VYWBMq1L7H 📸 Camera: https://amzn.to/3TQ3zsg 🎙️Microphone: https://amzn.to/3t5zXeV 🚦Lights: https://amzn.to/3TQlX49 🎛️ Audio Interface: https://amzn.to/3TVFAIq 🎚️ Stream Deck: https://amzn.to/3zzm7F5