Internal A/B evaluation surveys for model responses.

Build comparison surveys, collect calibrated win-rate data with Wilson 95% confidence intervals, and run AI follow-up probes — all gated to your team’s work email.

7 question types
A/B compare, rating, single & multi choice, short & long text, email. CSV bulk import for response pairs.
Calibrated results
Per-dimension win-rate with Wilson 95% CIs, completion histogram, raw & structured CSV exports.
AI follow-ups
Optional gpt-4o-mini probe per question, served via Supabase Edge Function with a per-survey daily call cap.

Internal A/B evaluation surveys for model responses.

7 question types

Calibrated results

AI follow-ups