Free during beta —Free during beta — to track favorite players + get game alertsto track favorites + alerts
Model reliability · audited
TheCalibration Grade
When the model says 70%, do those picks win about 70%? One number for how true its stated confidence is — graded on every completed game, wins and losses alike.
87out of 100
Calibrated & verified · 7 of 8 buckets calibrated within their 95% band
Record 13928–6882 (66.9%, 95% CI 66.3%–67.6%)ECE 0.013 · 6,295 graded
The reliability line
Each dot is a confidence bucket: where the model's stated probability (x) met the rate those picks actually won (y). A perfectly calibrated model rides the diagonal; the 95% band shows where small samples are still noise.
6,295 picks · 8 buckets
Model calibration: predicted win probability vs actual win rate by bucket, with 95% Wilson confidence intervals, over 6,295 graded picks.
Predicted
Actual win rate
Sample
95% CI
Verdict
50 to 55%
52.7%
1264 of 2397
51–55%
calibrated
55 to 60%
54.7%
1120 of 2046
53–57%
overconfident
60 to 65%
61.5%
678 of 1103
59–64%
calibrated
65 to 70%
68.1%
307 of 451
64–72%
calibrated
70 to 75%
75.5%
142 of 188
69–81%
calibrated
75 to 80%
78.6%
55 of 70
68–87%
calibrated
80 to 85%
89.7%
26 of 29
74–96%
calibrated
85 to 90%
90.9%
10 of 11
62–98%
calibrated
Said 50–55%
52.7%
1264/2397
95% CI 51–55%
Said 55–60%
54.7%
1120/2046
95% CI 53–57%
Said 60–65%
61.5%
678/1103
95% CI 59–64%
Said 65–70%
68.1%
307/451
95% CI 64–72%
Said 70–75%
75.5%
142/188
95% CI 69–81%
Said 75–80%
78.6%
55/70
95% CI 68–87%
Said 80–85%
89.7%
26/29
95% CI 74–96%
Said 85–90%
90.9%
10/11
95% CI 62–98%
The harder it commits, the more it's right
Win rate by confidence tier — the climb is the proof this is real signal, not favorite-picking. Tier thresholds are a cross-sport presentation convention.
Toss-up
52.5%(3,746)
Lean
57.5%(4,647)
Edge
67.9%(5,867)
Highest confidence
81.0%(6,550)
Last 30 days
Daily accuracy vs. the 50% coin-flip baseline. The cold days stay in the line.
How it's computed
For each confidence bucket we take the sample-weighted gap between the mean stated probability of the picks in it and the rate they actually won — the Expected Calibration Error (ECE). It uses the mean stated prob, not the bucket midpoint, so the number is reproducible from the published CSV.
Grade = round(100 − ECE × 1000): an ECE of 0.00 → 100, 0.05 → 50, 0.10 or worse → 0. The raw ECE sits beside the grade so the scaling is auditable.
It grades the model's raw stored win probability against the realized outcome on every completed game in the five fitted leagues — no window, no win-filter, cold streaks included.
Below 1,000 graded games or 4 populated buckets the grade reads “—” rather than publish a noisy number.
This grades calibration — whether our stated confidence is true — not profit.