GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

1.735 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Large language models (LLMs) typically demand substantial GPU memory, rendering training impractical on a single consumer GPU, especially for a 7-billion-parameter model that necessitates 58GB of memory. In response, the GaLore paper introduces an innovative strategy that projects gradients into a low-rank space, enabling the model to fit within the constraints of a single GPU. Remarkably, this approach not only addresses the memory challenge but also outperforms other parameter-efficient tuning methods like LoRA, delivering superior results.

paper link: https://arxiv.org/abs/2403.03507

Table of Content:
00 Intro
17 LoRA
18 Limitations of LoRA
58 GaLore
18 Adam with GaLore
01 8-Bit  Optimizers
50 LOMO
48 GaLore vs LoRA
20 Rank vs Perplexity
07 results

Icon made by Freepik from flaticon.com					

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Nhạc Theo Chủ Đề

Liên kết website