Question

How should outcome probabilities update during a match, and how can those updates support tactical decision-making?

Methods

  • Empirical state baseline
  • Remaining-goals Poisson model
  • GAM-based remaining-goals model
  • Minute-level state table
  • Pre-match strength priors from Project 2

Data Sources

  • StatsBomb Open Data
  • Minute-level match states
  • Pre-match strength priors from Project 2

State Table

68,978 rows

The first-pass live model is trained and tested on minute-level match states rather than final-match summaries.

Best Log Loss

1.1095

The GAM-based remaining-goals model currently produces the strongest live outcome probabilities on the 2020/2021 test season.

Best RPS

0.2198

The same model also leads on ranked probability score, beating both the empirical baseline and the simpler Poisson alternative.

Problem

Estimate in-game win, draw, and loss probabilities as the state of a match evolves.

Treat this as a remaining-goals problem from the current state to full time, because that creates a direct bridge between football events and live outcome probabilities.

Football Context

This directly supports the wider football problem of in-game strategy under changing conditions.

The project is designed to answer questions such as how much a red card shifted the match, how much the pre-match prior still matters at minute 60, and when a game state remains too uncertain for strong tactical conclusions.

Data

The first pass builds a minute-level state table from the broader La Liga archive used in Project 2.

The resulting state table contains 68,978 rows: 62,790 train, 3,003 validation, and 3,185 test.

Each row tracks minute, time remaining, score state, cumulative xG, red-card counts, and a pre-match strength-gap prior from Project 2.

  • Minute-level states rather than final-match summaries
  • Score difference and remaining time
  • Home and away cumulative xG
  • Home and away red-card counts
  • Pre-match strength gap from Project 2

Model Design

The model ladder is deliberately interpretable. A naive empirical state baseline sets the floor, a remaining-goals Poisson model provides a structured statistical step up, and a GAM-based remaining-goals model captures the nonlinear effect of time remaining and state intensity.

The key decision here was not to jump straight into a black-box live classifier. Remaining-goals models make the football mechanism visible: given the current state, how much scoring is still expected for each side?

The first pass already shows that adding richer state structure matters materially, with the GAM outperforming the simpler alternatives on the holdout season.

  • Model 0: empirical state baseline
  • Model 1: remaining-goals Poisson model
  • Model 2: GAM-based remaining-goals model
  • Primary objective: reliable live win, draw, and loss probabilities

Validation

Validation is again temporal rather than random. The live models are trained on earlier seasons and judged on 2020/2021 minute-level match states.

The evaluation focus is forecast quality at the state level, not just whether the eventual match outcome was guessed correctly from a late game state.

  • Multiclass log loss
  • Multiclass Brier score
  • Ranked probability score
  • Minute-level temporal test split

Results

The first-pass live model is already viable: both structured models beat the empirical baseline by a wide margin, and the GAM-based remaining-goals model currently performs best.

Current test metrics are: baseline log loss 4.7749, remaining-goals Poisson 1.1553, and remaining-goals GAM 1.1095; baseline ranked probability score 0.3474, Poisson 0.2363, and GAM 0.2198.

That is a strong early result because it shows the project is doing more than restating scoreline frequencies. It is learning a better live probability surface from state, xG, red cards, and prior strength.

The main football lesson is intuitive but now quantified: score difference matters more and more as time runs down, while pre-match strength matters most early or when the game is still level.

  • At minute 30, a level game still gives the home side about a 0.70 average win probability in this sample
  • At minute 75, a one-goal home lead is almost decisive in the current model
  • At minute 85, a level game is already close to a draw-heavy state unless one side is clearly stronger
Bar charts comparing live log loss, Brier score, and ranked probability score across empirical baseline, remaining-goals Poisson, and remaining-goals GAM models.

Live Metric Comparison

Both structured live models massively outperform the empirical baseline, and the GAM-based remaining-goals model currently leads on all three first-pass test metrics.

Line chart showing home win, draw, and away win probabilities over match minutes for one example test match, with red-card states marked.

Probability Timeline

The timeline view turns the model into something tactical and interpretable: probabilities move with score, time, red cards, and the pre-match prior rather than staying static.

Calibration chart comparing predicted home-win probability against observed home-win frequency across live state probability buckets.

State Calibration

This first-pass calibration view shows whether the live home-win probabilities are directionally aligned with observed outcomes rather than simply being sharp or dramatic.

Line chart showing average home-win probability by minute when the home side is trailing, level, or leading by one goal.

Score State by Minute

This is the clearest football takeaway in the project: the same one-goal lead means something very different at minute 15 than it does at minute 75, and the model captures that directly.

Grouped bar chart showing average home-win probability when the game is level for stronger home teams, stronger away teams, and similar-strength teams.

Pre-Match Strength While Level

Pre-match strength still matters while the game is level. The stronger team starts with a real edge, and that prior continues to shape the live probabilities until enough in-match evidence arrives.

Grouped bar chart showing average home-win probability under red-card imbalances in later minute buckets.

Late Red-Card Impact

Late red-card imbalances create the most dramatic shifts in this dataset. When the home team is down a player late, its win probability nearly disappears; when the away team is down a player late, the home side is almost certain to win.

What Failed During Development

The first implementation of the state builder failed on empty event subsets, because matches without red cards produced empty tables with no typed minute column. That broke the minute-by-minute state extraction logic until the pipeline was made robust to empty event classes.

The project also forced a modelling choice early: rather than jumping to a direct live classification model, the remaining-goals approach proved to be the more defensible first version because it stayed interpretable and fit naturally with football match mechanics.

Those failures improved the project because they pushed the workflow toward robustness in the data layer and discipline in the modelling layer.

  • Data-layer failure: empty red-card event tables broke state extraction
  • Fix: typed empty tables and stricter event coercion
  • Modelling decision: rejected direct black-box live classifier for first pass
  • Main lesson: robust live systems start with strong state design, not model complexity

Decision Use

The model should support tactical reasoning, not automate it. Analysts still need opponent context, player fitness information, and coaching intent.

The right use is to describe how the game state changed, how much prior expectations still matter, and where uncertainty is still too large to overreact.

What it can say now is when the match has become materially more stable or unstable. What it cannot yet say cleanly is whether a substitution at minute 58 is better than one at minute 68, because substitutions are not yet modelled as causal interventions.

Engineering

The implementation separates state-table construction, pre-match prior ingestion, remaining-goals modelling, and evaluation into modular pieces.

That separation matters because the in-game model is the first project where the data volume becomes meaningfully larger: minute-level states are much more demanding than match-level or shot-level summaries.

  • Minute-level state build
  • Project 2 prior integration
  • Reusable live evaluation utilities
  • Temporal split logic
  • Output export for later site integration

Limitations

The first-pass scoreline path is approximated from cumulative xG progression rather than exact goal timestamps, so the state table is useful but not yet a perfect reconstruction of live match history.

Substitution quality and tactical shape changes are still only partially observed in the standard event feed, which limits certainty.

Red-card states are relatively sparse, so the live model should be interpreted carefully in extreme manpower situations.

That is why this version can describe red-card impact more confidently than substitution timing: the substitution effect is not yet modelled directly.

Next Iteration

The next upgrade is to replace the approximate score-state reconstruction with true event-time score tracking from goal events.

A second upgrade is to export live probability timelines and tactical scenario charts, which would make the case study much more visually concrete.

Later iterations can add substitutions and more explicit tactical scenario analysis once the live state backbone is fully trustworthy.

Pipeline Workflow

  1. Reuse the Project 2 La Liga archive and pre-match strength priors.
  2. Build a minute-level match-state table from event data.
  3. Track scoreline, cumulative xG, and red-card state through each match.
  4. Define remaining home and away goals from each state to full time.
  5. Fit an empirical baseline, a remaining-goals Poisson model, and a GAM-based live model.
  6. Evaluate live probabilities on a temporal 2020/2021 test season.
  7. Export summary artefacts for the portfolio site.

Repository Structure

  • modeling/project-3-live-win-probability/config/project_config.R for archive and split settings
  • modeling/project-3-live-win-probability/R/state_table.R for minute-level state construction
  • modeling/project-3-live-win-probability/R/modeling.R for empirical and remaining-goals models
  • modeling/project-3-live-win-probability/R/evaluation.R for multiclass live forecast metrics
  • modeling/project-3-live-win-probability/scripts/ for sequential state-build, fit, evaluate, and export steps
  • modeling/project-3-live-win-probability/outputs/ for model artefacts and summary files

What Wider Use Would Require

  • Low-latency state updates
  • Scenario engine tests
  • Monitoring on stale or missing event inputs
  • Analyst-facing interpretation guidance
  • Exact event-time score tracking before stronger deployment claims