AI evaluation workflows are manual, inconsistent and painful. In this workshop, founder of Context.ai Henry Scott-Green, demonstrates how to build and automate an effective workflow to monitor, measure and improve the performance of LLM-powered products.
In this workshop, you will learn:
- How to effectively test your LLM prompts at scale
- How to gather analytics on user behavior and sentiment to improve product performance
- How to evaluate the accuracy of your LLM product
- How to utilise insights to mitigate hallucinations of your AI application
and more.
About Context.ai
Context.ai helps developers to assess and improve the quality of their LLM powered products, both before and after they reach production. Context’s evaluation and analytics platform enables teams to ship LLM applications faster and with greater confidence, knowing quality issues will be caught and fixed before they reach end users. Learn more at https://context.ai
code: https://colab.research.google.com/drive/1c4U5EoSFTo3GFip2MKLz6SqztRPy1IX7?authuser=2
slides: https://docs.google.com/presentation/d/1OxxjM7O7NJdOsYKd72TMW5jtzN13IWoskPHBgRAbg5M/edit#slide=id.g265dcb64012_0_7
Chapters:
0:00 - Introducing Context.ai
2:40 - Airline demo walkthough
9:32 - Testing and optimizing prompts
18:33 - Context.ai Playground
20:31 - Extracting and visualising analytics from user transcripts
#ai #productmanagement #analytics