INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning
This paper introduces **INTELLECT-2**, a large language model with 32 billion parameters trained using a unique method called **globally decentralized reinforcement learning (RL)**. Instead of relying on a single, centralized supercomputer, INTELLECT-2 was trained across many different computers worldwide that weren't all connected perfectly or using the same hardware. To make this possible, the researchers built new tools, including **PRIME-RL** for handling the distributed asynchronous RL training, **TOPLOC** for checking if the calculations done by untrusted computers were correct, and **SHARDCAST** for efficiently sending updated model instructions to all the participating computers. They also made changes to the standard training process and data handling to make sure the model learned properly and the training stayed stable, even improving upon the previous best model in this size range, QwQ-32B. They are **open-sourcing the model, the training data, and all the code** so others can explore this decentralized training approach.
https://storage.googleapis.com/public-technical-paper/INTELLECT_2_Technical_Report.pdf