Today, we're joined by Julie Kallini, PhD student at Stanford University to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression rates for under-resourced languages—and dig into byte-level modeling as an alternative. We discuss the architecture of MrT5, its ability to learn language-specific compression rates, its performance on multilingual benchmarks and character-level manipulation tasks, and its performance and efficiency. For the “Mission: Impossible Language Models” paper, we review the core idea behind the research, the definition and creation of impossible languages, the creation of impossible language training datasets, and explore the bias of language model architectures towards natural language.
🎧 / 🎥 Listen or watch the full episode on our page: https://twimlai.com/go/724.
🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confirmation=1
🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/
Follow us on Twitter: https://twitter.com/twimlai
Follow us on LinkedIn: https://www.linkedin.com/company/twimlai/
Join our Slack Community: https://twimlai.com/community/
Subscribe to our newsletter: https://twimlai.com/newsletter/
Want to get in touch? Send us a message: https://twimlai.com/contact/
📖 CHAPTERS
===============================
00:00 - Introduction
4:28 - Issues of tokenization for LLMs
11:26 - Sub-word tokenization versus byte level tokenization
16:28 - Inefficiencies of byte T5
17:08 - Mr. T5 architecture
22:05 - Language-specific compression rate
24:10 - Benchmarks
27:15 - Inference efficiency
28:50 - Applying MrT5 to other decoder models
31:15 - Future directions of MrT5
33:51 - Mission: Impossible Language Models paper
39:59 - Languages tested
45:13 - Language architectures biased toward natural languages vs impossible languages
48:19 - Future directions for Mission Impossible
🔗 LINKS & RESOURCES
===============================
Mission: Impossible Language Models - https://arxiv.org/abs/2401.06416
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models - https://openreview.net/forum?id=VYWBMq1L7H
📸 Camera: https://amzn.to/3TQ3zsg
🎙️Microphone: https://amzn.to/3t5zXeV
🚦Lights: https://amzn.to/3TQlX49
🎛️ Audio Interface: https://amzn.to/3TVFAIq
🎚️ Stream Deck: https://amzn.to/3zzm7F5