Calibrated professional filmmakers score frontier models blind on six-axis craft rubrics, under a published Krippendorff α ≥ 0.70 reliability standard. Every judgment is attributable: which axis, which observation, which take.
The full methodology: same-model pairwise design, panel qualification, the reliability statistics, and the output schemas.
Private pre-release evaluation: quality scores, six-axis diagnostics, RLHF/DPO preference pairs, and the reliability report. Published only on your opt-in.
Panel members: sign in with your access code to qualify and score. Three blind scorers per pair, paid per evaluation.