Sarvesh Gharat - Representative Arm Identification A fixed confidence approach to identify cluster
We study the representative arm identification (RAI) problem in the multi-armed bandits (MAB) framework, wherein we have a collection of arms, each associated with an unknown reward distribution. An underlying instance is defined by a partitioning of the arms into clusters of predefined sizes, such that for any j i, all arms in cluster i have a larger mean reward than those in cluster j. The goal in RAI is to reliably identify a certain prespecified number of arms from each cluster while using as few arm pulls as possible. The RAI problem covers as special cases several well-studied MAB problems such as identifying the best arm or any M out of the top K, as well as both full and coarse ranking. We start by providing an instance-dependent lower bound on the sample complexity of any feasible algorithm for this setting. We then propose two algorithms, based on the idea of confidence intervals, and provide high probability upper bounds on their sample complexity, which orderwise match the lower bound. Finally, we do an empirical comparison of both algorithms along with an LUCB-type alternative on both synthetic and real-world datasets, and demonstrate the superior performance of our proposed schemes in most cases
Sarvesh is a Ph.D. candidate at the Centre for Machine Intelligence and Data Science, IIT Bombay. His research focuses on Online Learning, Multi-Armed Bandits in PAC settings, and Generative AI, where he explores optimal LLM fine-tuning and prompting strategies. More recently, he has been delving into LLM Alignment and Multi-Agent Systems, studying interactions among artificial agents and alignment strategies for robust AI. His work bridges learning theory, reinforcement learning, and real-world AI applications.
This session is brought to you by the Cohere For AI Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you to Rahul Narava
Gusti Winata, Leads of our Reinforcement Learning group for their dedication in organizing this event.
If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker.
Join the Cohere For AI Open Science Community to see a full list of upcoming events (https://tinyurl.com/C4AICommunityApp).