GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

1.735 Lượt nghe
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Large language models (LLMs) typically demand substantial GPU memory, rendering training impractical on a single consumer GPU, especially for a 7-billion-parameter model that necessitates 58GB of memory. In response, the GaLore paper introduces an innovative strategy that projects gradients into a low-rank space, enabling the model to fit within the constraints of a single GPU. Remarkably, this approach not only addresses the memory challenge but also outperforms other parameter-efficient tuning methods like LoRA, delivering superior results. paper link: https://arxiv.org/abs/2403.03507 Table of Content: 00:00 Intro 02:17 LoRA 03:18 Limitations of LoRA 05:58 GaLore 18:18 Adam with GaLore 21:01 8-Bit Optimizers 22:50 LOMO 24:48 GaLore vs LoRA 26:20 Rank vs Perplexity 27:07 results Icon made by Freepik from flaticon.com