Algorithms to support AI safety cases | Absolutely Interdisciplinary 2024

Algorithms to support AI safety cases | Absolutely Interdisciplinary 2024

500 Lượt nghe
Algorithms to support AI safety cases | Absolutely Interdisciplinary 2024
With the rapid advancement of large language model capabilities, it becomes increasingly essential to demonstrate that a given AI system doesn’t pose a catastrophic risk. In this session, Roger Grosse will outline how AI risks can be categorized into “AI Safety Levels” and the logic of how one might build a safety case at a given level. This motivates the need for algorithmic advances that will help build safety cases (or determine if a model is unsafe). Grosse will overview some recent work on training data attribution (TDA), where one tries to determine which training examples are responsible for a model’s outputs. Better TDA methods should help build safety cases by measuring the consequences of each stage of training for the model’s behavioral proclivities. Speakers: Ashton Anderson (moderator), Roger Grosse About Absolutely Interdisciplinary 2024: An annual academic conference hosted by the Schwartz Reisman Institute for Technology and Society, Absolutely Interdisciplinary convenes leading thinkers from a rich variety of fields to engage in conversations that encourage innovation and inspire new insights. Connecting technical researchers, social scientists, and humanists, Absolutely Interdisciplinary fosters new ways of thinking about the challenges presented by artificial intelligence and other powerful data-driven technologies to build a future that promotes human well-being—for everyone. Conference participants will contribute to and learn about emerging research areas and new questions to explore. Each session pairs researchers from different disciplines to address a common question and facilitate a group discussion. By identifying people working on similar questions from different perspectives, we will foster conversations that develop the interdisciplinary approaches and research questions needed to understand how AI can be made to align with human values. About the Schwartz Reisman Institute: Located at the University of Toronto, the Schwartz Reisman Institute for Technology and Society’s mission is to deepen our knowledge of technologies, societies, and what it means to be human by integrating research across traditional boundaries and building human-centred solutions that really make a difference. The integrative research we conduct rethinks technology’s role in society, the contemporary needs of human communities, and the systems that govern them. We’re investigating how best to align technology with human values and deploy it accordingly. The human-centred solutions we build are actionable and practical, highlighting the potential of emerging technologies to serve the public good while protecting citizens and societies from their misuse. We want to make sure powerful technologies truly make the world a better place—for everyone.