Custom Safety Evaluations for AI Systems and Agents. AI & Big Data Expo Global 2025
Alexander Borodetskiy shares Toloka's experience creating human-generated data for safety training and evaluation.
This talk looks at the typical challenges of developing evaluation data.
1. Why custom evaluations are necessary (why open datasets aren't enough).
2. Three real-world safety eval projects and how we handle technical challenges.
3. Effective red teaming, from testing multimodal systems to evaluating AI agent behaviors.
4. Practical approaches to safety testing.