TEXAS: Fine-Tuning Is for Cowards - Do RL
Supervised Finetuning (SFT) and Reinforcement Learning (RL): The Hidden Solutions and Why They Matter for AI Reasoning. SFT + RL or RL only: AI Research is valid for 1 week
All rights w/ authors:
"SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models"
Hardy Chen2, Haoqin Tu1, Fali Wang3, Hui Liu4, Xianfeng Tang4, Xinya Du2,
Yuyin Zhou1, Cihang Xie1
from
1 University of California, Santa Cruz
2 University of Texas at Dallas
3 The Pennsylvania State University
4 Amazon Research