INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning

40 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning

This paper introduces **INTELLECT-2**, a large language model with 32 billion parameters trained using a unique method called **globally decentralized reinforcement learning (RL)**. Instead of relying on a single, centralized supercomputer, INTELLECT-2 was trained across many different computers worldwide that weren't all connected perfectly or using the same hardware. To make this possible, the researchers built new tools, including **PRIME-RL** for handling the distributed asynchronous RL training, **TOPLOC** for checking if the calculations done by untrusted computers were correct, and **SHARDCAST** for efficiently sending updated model instructions to all the participating computers. They also made changes to the standard training process and data handling to make sure the model learned properly and the training stayed stable, even improving upon the previous best model in this size range, QwQ-32B. They are **open-sourcing the model, the training data, and all the code** so others can explore this decentralized training approach.

https://storage.googleapis.com/public-technical-paper/INTELLECT_2_Technical_Report.pdf					

INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning

Nhạc Theo Chủ Đề

Liên kết website