Free during beta —Free during beta — to track favorite players + get game alertsto track favorites + alerts

Model reliability · audited

The Calibration
Grade

When the model says 70%, do those picks win about 70%? One number for how true its stated confidence is — graded on every completed game, wins and losses alike.

87out of 100

Calibrated & verified · 7 of 8 buckets calibrated within their 95% band

Record 13928–6882 (66.9%, 95% CI 66.3%–67.6%)ECE 0.013 · 6,295 graded

The reliability line

Each dot is a confidence bucket: where the model's stated probability (x) met the rate those picks actually won (y). A perfectly calibrated model rides the diagonal; the 95% band shows where small samples are still noise.

6,295 picks · 8 buckets

Model calibration: predicted win probability vs actual win rate by bucket, with 95% Wilson confidence intervals, over 6,295 graded picks.
Predicted	Actual win rate	Sample	95% CI	Verdict
50 to 55%	52.7%	1264 of 2397	51–55%	calibrated
55 to 60%	54.7%	1120 of 2046	53–57%	overconfident
60 to 65%	61.5%	678 of 1103	59–64%	calibrated
65 to 70%	68.1%	307 of 451	64–72%	calibrated
70 to 75%	75.5%	142 of 188	69–81%	calibrated
75 to 80%	78.6%	55 of 70	68–87%	calibrated
80 to 85%	89.7%	26 of 29	74–96%	calibrated
85 to 90%	90.9%	10 of 11	62–98%	calibrated

The harder it commits, the more it's right

Win rate by confidence tier — the climb is the proof this is real signal, not favorite-picking. Tier thresholds are a cross-sport presentation convention.

Toss-up

52.5% (3,746)

Lean

57.5% (4,647)

Edge

67.9% (5,867)

Highest confidence

81.0% (6,550)

Last 30 days

Daily accuracy vs. the 50% coin-flip baseline. The cold days stay in the line.

How it's computed

For each confidence bucket we take the sample-weighted gap between the mean stated probability of the picks in it and the rate they actually won — the Expected Calibration Error (ECE). It uses the mean stated prob, not the bucket midpoint, so the number is reproducible from the published CSV.
Grade = round(100 − ECE × 1000): an ECE of 0.00 → 100, 0.05 → 50, 0.10 or worse → 0. The raw ECE sits beside the grade so the scaling is auditable.
It grades the model's raw stored win probability against the realized outcome on every completed game in the five fitted leagues — no window, no win-filter, cold streaks included.
Below 1,000 graded games or 4 populated buckets the grade reads “—” rather than publish a noisy number.

This grades calibration — whether our stated confidence is true — not profit.

Download the full bucket-level CSV →

The CalibrationGrade

The reliability line

The harder it commits, the more it's right

Last 30 days

How it's computed

The Calibration
Grade