sandbox / judging_dataclasses.py

Commit History

Parse judgments with structured output prompting, one response model, one judge model at a time.
eb4ec23

justinxzhao commited on

Added per-response plots.
3e0f8f8

justinxzhao commited on

Some refactoring, judging responses for direct assessment.
577870e

justinxzhao commited on