Fenna ten Haaf is a Data Scientist at Rabobank. She has a background in Econometrics and is working with the Responsible GenAI team in Rabobank to implement a clear framework for the evaluation of chatbots. Lately her focus has been on comparing different metrics and performing analyses on their quality to build confidence in the tool. Fenna lives in Rotterdam and outside of work she enjoys reading and learning languages.
The evaluation of chatbots and generative AI in general is a big challenge. Ideally, the answers from a chatbot could be compared to ‘good’ or correct answers to determine the quality, but these ground-truths are time-consuming to write and even then still difficult to compare. At the same time, a proper evaluation framework is the […]