Methodology·How the model works
One rating per team, updated after every game. K-factor tuned per sport. Pre-game predictions logged to a public table before tip-off, so the track record can't be cherry-picked after the fact. The MLB back-test sits at 55% on 2,932 games; the live cross-sport hit rate updates daily at /model/accuracy.
Live receipts · cross-sport
Last 7
59.6%
99 games
Last 30
61.3%
517 games
Last 90
+0.3vs mkt
542/1896 priced · raw 60.5%
All time
+0.7vs mkt
1891/20581 priced · raw 67.0%
By confidence tier · all-time
★ Locks
80.9%
6508g
+20.9
Edges
67.9%
5798g
+12.9
Leans
57.6%
4593g
+5.6
Tossups
52.7%
3682g
+2.7
Inputs
Public APIs, on a schedule we publish. No hand-cleaned aggregates pretending to be live, no scraped data we couldn't show you the source of.
Final scores, line scores, in-game state. Polled every 30s during live games. The Elo rating engine consumes finals from this feed within minutes of the last out / final whistle.
Probable starters, lineups, full play-by-play. Powers the pre-game pitching matchup card and the per-batter analytics on box scores.
Shot-level data with x/y coordinates, advanced team rates (pace, ORtg/DRtg, Net), per-possession splits. Drives the shot maps on every NBA player profile.
Multi-book odds (DraftKings, FanDuel, BetMGM, Caesars, ESPN BET). Snapshots are stored, so the line-movement chart on every game page reads from history rather than a single point in time.
Player bios, draft history, hand-curated awards lists. Used by the 2026 NFL Draft prospect pages and the historical comp pool.
Game-day wind, precipitation, and temperature for outdoor venues. Surfaces on game preview cards in NFL and MLB.
The loop
Within a few minutes of a final, the per-league cron picks up the result and queues a rating update. We don't backfill — every game gets exactly one Elo update, in chronological order.
Pre-game ratings are frozen for the prediction record. Post-game ratings move by K × (actual − expected), with a margin-of-victory bump capped at 1.5× so blowouts don't dominate. K and home advantage are tuned per league: K=8 for MLB, higher for NBA, lower for NHL where OT/SO outcomes carry more variance.
When the next slate publishes, we compute home win probability from the rating delta plus home advantage, run it through the logistic, and write the prediction to a public table before tip-off. That timestamp is what makes the track record auditable later.
An hour or so before tip-off, we snapshot every book's closing moneyline + spread + total. The closing line is what every honest model gets benchmarked against — so we keep all of them, not just the median.
After the snapshot, the model compares its win probability to the de-juiced implied probability on each side. Games sort by edge magnitude. The /<league>/edges and /best-bets surfaces read from this ranking.
Final score updates the rating (back to step 01 for the next game). The cross-sport track record at /model/accuracy aggregates every prediction by confidence bucket, so we can tell whether a 70% pick actually hits 70% of the time. Drift triggers a K-factor review.
Interactive
Drag the team ratings and the outcome. The pre-game win probability, the implied moneyline, and the post-game rating update in real time. This is the same formula the cron uses.
Elo · interactive
Team A pre-game rating
Team B pre-game rating
Outcome
K-factor
Pre-game read
A win prob
57.1%
A implied ML
-133
B win prob
42.9%
B implied ML
+133
Post-game update
A new rating
from 1500
B new rating
from 1450
Standard Elo: expected = 1 / (1 + 10(B−A)/400), new rating = old + K × (actual − expected). MLB model ships K = 24.
Back-test
Calibration drift is how far the actual hit rate deviates from the claimed confidence bucket. A 70% bucket should hit 70% of the time across enough samples; ±2.4pp is the current drift on the MLB sample. Closing-line agreement is the average edge magnitude on picks where the model and the closing line ended up on the same side, which is the more honest read than raw hit rate.
Posture
Confidence labels max out around 85% on MLB and lower on NHL. Anyone shipping picks at 95%+ is either miscalibrated or selling. Honest models live in the 55–70 range and grind out small edges over volume.
The track record shows the full last 30 days, including the cold weeks. The cross-sport track record can't be filtered to just the wins. If we have a bad stretch, the receipts say so.
If the rating engine doesn't have enough samples for a league (early NBA season, returning from a freeze), the prediction gets a low-confidence label and the edge surface dims it. Filling gaps with priors and pretending is how every other site loses.
Every model edge across the four modeled leagues, ranked by the gap between our win probability and the moneyline. Updated as soon as new lines post.