Walk-forward test: for each match day, the model - trained only on games before that day - predicts each game, then we score it against what actually happened. Because standings are decided by total games won, the headline metric is game-level accuracy: of all individual games, how often did the model's favored team win? (Series win-rate is shown too, but it matters less.)
Honest A/B test of the two models, walk-forward. Current season only starts every team at the same rating; all data seeds each team from its roster's career skill from past seasons (a pre-season signal, so it's a fair test). The biggest difference should show up early in the season, before this year's results accumulate. "Series win-rate" matters less since standings reward total game wins (60.2% current).
| Match day | Games | Game accuracy | Cumulative | Series win-rate | Brier |
|---|---|---|---|---|---|
| 1 | 400 | 49.2% | 49.2% | 49.3% | 0.25 |
| 2 | 400 | 53.8% | 51.5% | 60.3% | 0.247 |
| 3 | 400 | 51.7% | 51.6% | 56.0% | 0.247 |
| 4 | 400 | 57.2% | 53.0% | 66.2% | 0.241 |
| 5 | 400 | 61.3% | 54.6% | 70.0% | 0.239 |
| 6 | 400 | 49.8% | 53.8% | 52.3% | 0.252 |
| 7 | 400 | 55.2% | 54.0% | 63.8% | 0.246 |
| 8 | 400 | 56.8% | 54.4% | 64.6% | 0.249 |
| 9 | 4 | 25.0% | 54.3% | 0.0% | 0.264 |
Match day 1 sits near 50% because every team starts at the same rating, so early predictions are essentially coin flips; accuracy climbs as the ratings learn. These are genuinely high-variance games - the per-game edge is only a few points - but it compounds over a season into the playoff odds.
Data source: live (rscna.com) - standings & schedule refresh hourly.