WAFFLEBENCH · PRIVATE EVALUATION

Submit your model.

Put an unreleased model or checkpoint in front of the calibrated expert panel, blind, against the same locked anchors as the public leaderboard. You get the quality scores, six-axis diagnostics with written reasoning, the minted RLHF/DPO preference pairs, and the reliability report. Results stay private unless you opt into publication.

1
Tell us about the model

The form below. Nothing sensitive: name, modality, which verticals you want scored.

2
We scope the run

Within two business days you get the run plan, timeline (typically 10 business days), and pricing per the rate card.

3
Panel scores blind

Your model is masked. Three calibrated scorers per pair. Deliverables arrive as files, with the reliability evidence attached.

Submitting opens a scoping conversation; it does not commit you to a run. We never publish private results without written opt-in. Continuous per-checkpoint monitoring is available, ask in the notes.
Received. We will reply from enterprise@wafflevideo.ai within two business days with your run plan.