DeepSeek Internals - Floating-Point 8

DeepSeek Internals - Floating-Point 8

366 Lượt nghe
DeepSeek Internals - Floating-Point 8
This week we continue covering DeepSeek and begin diving into it's technology. Specifically, this week we looked at Multi-head Latent Attention, a technique that can substantially increase the throughput of our model. This week we looked at DeepSeek's use of FP8, 8-bit floating point numbers. We saw that using FP8 is one of the keys to the high performance of the system. And the novel ways the DeepSeek team uses FP8 provides advantages for them now, and into the future. *Links* Our Meetup: https://www.meetup.com/East-Bay-Tri-Valley-Machine-Learning-Meetup/ DeepSeek: https://www.deepseek.com/ *Content* 00:00 Initial discussion 00:49 What is FP8 11:42 Why is FP8 used 16:28 How is FP8 used 23:02 FP8 issues 34:42 Trade-offs 35:27 Implementation 55:32 Summary ============================ 😊About Us West Coast Machine Learning is a channel dedicated to exploring the exciting world of machine learning! Our group of techies is passionate about deep learning, neural networks, computer vision, tiny ML, and other cool geeky machine learning topics. We love to dive deep into the technical details and stay up to date with the latest research developments. Our Meetup group and YouTube channel is the perfect place to connect with other like-minded individuals who share your love of machine learning. We offer a mix of research paper discussions, coding reviews, and other data science topics. So, if you're looking to stay up to date with the latest developments in machine learning, connect with other techies, and learn something new, be sure to subscribe to our channel and join our Meetup community today! Meetup: https://www.meetup.com/east-bay-tri-valley-machine-learning-meetup/ ============================= #DeepSeek #8bit #DeepSeek-performance