Test AI agents - Question bank with answer rubric. Ask Same question 3 times to the model. - Have separate QA models to evaluate the answers. - Check for, - Average accuracy wrt the rubric (QA model) - Similarity across 3 questions (QA model) - Adherance to the format (QA model) - Consistency of facts / claims (QA model) Other things to do; - Set temperature to low - Use multiple QA models so that bias is eliminated
Test AI agents
Other things to do;