Evaluating AI Agent Output with GitHub Copilot and AI Toolkit (Pet Planner Workshop, Part 6)

April from Microsoft Developer presents practical techniques for evaluating AI agent output using GitHub Copilot and AI Toolkit, offering step-by-step insights for developers in this Pet Planner workshop segment.

Evaluating AI Agent Output with GitHub Copilot and AI Toolkit

Presenter: April (Microsoft Developer)

This video is part six in a workshop series combining AI Toolkit and GitHub Copilot, focusing specifically on how developers can evaluate the output of AI agents within a real-world scenario—the Pet Planner application.

Workshop Overview

Agenda & Chapter Markers

Key Technical Steps Demonstrated

1. Preparing for Evaluation

2. Choosing Evaluators Using Copilot

3. Dataset Creation

4. Building the Evaluation Script

5. Reviewing and Reporting Results

Technologies & Tools Highlighted

Summary

This session provides actionable, step-by-step instruction on how to evaluate the performance of AI agents, leveraging Microsoft Copilot, AI Toolkit, and Azure-powered infrastructure. The workshop is designed for developers looking to build, test, and iterate on real-world AI applications.