TRANSACTION · MLBTwins: Sent LHP Kendry Rojas on a rehab assignment to St (Jun 16)
TRANSACTION · MLBTigers: Placed 2B Gleyber Torres on the 10-day IL retroactive to June 16 (Jun 16)
TRANSACTION · MLBAngels: Designated 1B Trey Mancini for assignment (Jun 16)
Free during beta —to track favorites + alerts

Model reliability · audited

The Calibration
Grade

When the model says 70%, do those picks win about 70%? One number for how true its stated confidence is — graded on every completed game, wins and losses alike.

87out of 100

Calibrated & verified · 7 of 8 buckets calibrated within their 95% band

Record 13928–6882 (66.9%, 95% CI 66.3%–67.6%)ECE 0.013 · 6,295 graded

The reliability line

Each dot is a confidence bucket: where the model's stated probability (x) met the rate those picks actually won (y). A perfectly calibrated model rides the diagonal; the 95% band shows where small samples are still noise.

6,295 picks · 8 buckets

Model calibration: predicted win probability vs actual win rate by bucket, with 95% Wilson confidence intervals, over 6,295 graded picks.
PredictedActual win rateSample95% CIVerdict
50 to 55%52.7%1264 of 23975155%calibrated
55 to 60%54.7%1120 of 20465357%overconfident
60 to 65%61.5%678 of 11035964%calibrated
65 to 70%68.1%307 of 4516472%calibrated
70 to 75%75.5%142 of 1886981%calibrated
75 to 80%78.6%55 of 706887%calibrated
80 to 85%89.7%26 of 297496%calibrated
85 to 90%90.9%10 of 116298%calibrated

The harder it commits, the more it's right

Win rate by confidence tier — the climb is the proof this is real signal, not favorite-picking. Tier thresholds are a cross-sport presentation convention.

Toss-up
52.5% (3,746)
Lean
57.5% (4,647)
Edge
67.9% (5,867)
Highest confidence
81.0% (6,550)

Last 30 days

Daily accuracy vs. the 50% coin-flip baseline. The cold days stay in the line.

30-day accuracy versus the 50% coin-flip baselineCOIN FLIP 50%

How it's computed

  • For each confidence bucket we take the sample-weighted gap between the mean stated probability of the picks in it and the rate they actually won — the Expected Calibration Error (ECE). It uses the mean stated prob, not the bucket midpoint, so the number is reproducible from the published CSV.
  • Grade = round(100 − ECE × 1000): an ECE of 0.00 → 100, 0.05 → 50, 0.10 or worse → 0. The raw ECE sits beside the grade so the scaling is auditable.
  • It grades the model's raw stored win probability against the realized outcome on every completed game in the five fitted leagues — no window, no win-filter, cold streaks included.
  • Below 1,000 graded games or 4 populated buckets the grade reads “—” rather than publish a noisy number.

This grades calibration — whether our stated confidence is true — not profit.

Download the full bucket-level CSV →