DM Analytics

Question

How should outcome probabilities update during a match, and how can those updates support tactical decision-making?

Methods

Empirical state baseline
Remaining-goals Poisson model
GAM-based remaining-goals model
Minute-level state table
Pre-match strength priors from Project 2

Data Sources

StatsBomb Open Data
Minute-level match states
Pre-match strength priors from Project 2

State Table

68,978 rows

The first-pass live model is trained and tested on minute-level match states rather than final-match summaries.

Best Log Loss

1.1095

The GAM-based remaining-goals model currently produces the strongest live outcome probabilities on the 2020/2021 test season.

Best RPS

0.2198

The same model also leads on ranked probability score, beating both the empirical baseline and the simpler Poisson alternative.

Problem

Estimate in-game win, draw, and loss probabilities as the state of a match evolves.

Treat this as a remaining-goals problem from the current state to full time, because that creates a direct bridge between football events and live outcome probabilities.

Football Context

This directly supports the wider football problem of in-game strategy under changing conditions.

The project is designed to answer questions such as how much a red card shifted the match, how much the pre-match prior still matters at minute 60, and when a game state remains too uncertain for strong tactical conclusions.

Data

The first pass builds a minute-level state table from the broader La Liga archive used in Project 2.

The resulting state table contains 68,978 rows: 62,790 train, 3,003 validation, and 3,185 test.

Each row tracks minute, time remaining, score state, cumulative xG, red-card counts, and a pre-match strength-gap prior from Project 2.

Minute-level states rather than final-match summaries
Score difference and remaining time
Home and away cumulative xG
Home and away red-card counts
Pre-match strength gap from Project 2

Model Design

The model ladder is deliberately interpretable. A naive empirical state baseline sets the floor, a remaining-goals Poisson model provides a structured statistical step up, and a GAM-based remaining-goals model captures the nonlinear effect of time remaining and state intensity.

The key decision here was not to jump straight into a black-box live classifier. Remaining-goals models make the football mechanism visible: given the current state, how much scoring is still expected for each side?

The first pass already shows that adding richer state structure matters materially, with the GAM outperforming the simpler alternatives on the holdout season.

Model 0: empirical state baseline
Model 1: remaining-goals Poisson model
Model 2: GAM-based remaining-goals model
Primary objective: reliable live win, draw, and loss probabilities

Validation

Validation is again temporal rather than random. The live models are trained on earlier seasons and judged on 2020/2021 minute-level match states.

The evaluation focus is forecast quality at the state level, not just whether the eventual match outcome was guessed correctly from a late game state.

Multiclass log loss
Multiclass Brier score
Ranked probability score
Minute-level temporal test split

Results

The first-pass live model is already viable: both structured models beat the empirical baseline by a wide margin, and the GAM-based remaining-goals model currently performs best.

Current test metrics are: baseline log loss 4.7749, remaining-goals Poisson 1.1553, and remaining-goals GAM 1.1095; baseline ranked probability score 0.3474, Poisson 0.2363, and GAM 0.2198.

That is a strong early result because it shows the project is doing more than restating scoreline frequencies. It is learning a better live probability surface from state, xG, red cards, and prior strength.

The main football lesson is intuitive but now quantified: score difference matters more and more as time runs down, while pre-match strength matters most early or when the game is still level.

At minute 30, a level game still gives the home side about a 0.70 average win probability in this sample
At minute 75, a one-goal home lead is almost decisive in the current model
At minute 85, a level game is already close to a draw-heavy state unless one side is clearly stronger

A one-goal lead becomes rapidly more powerful as time runs down

The biggest signal in the live model is still score difference, but the important point is how much that signal sharpens with time. At minute 30, a one-goal lead is strong. By minute 75, it is close to decisive in this sample.

Home trailing by one

Level game

Home leading by one

Minute	Trailing by 1	Level	Leading by 1
30	29.4%	70.4%	93.2%
60	21.3%	77.0%	97.4%
75	16.8%	85.3%	98.8%
85	12.4%	93.8%	99.8%

What Failed During Development

The first implementation of the state builder failed on empty event subsets, because matches without red cards produced empty tables with no typed minute column. That broke the minute-by-minute state extraction logic until the pipeline was made robust to empty event classes.

The project also forced a modelling choice early: rather than jumping to a direct live classification model, the remaining-goals approach proved to be the more defensible first version because it stayed interpretable and fit naturally with football match mechanics.

Those failures improved the project because they pushed the workflow toward robustness in the data layer and discipline in the modelling layer.

Data-layer failure: empty red-card event tables broke state extraction
Fix: typed empty tables and stricter event coercion
Modelling decision: rejected direct black-box live classifier for first pass
Main lesson: robust live systems start with strong state design, not model complexity

Decision Use

The model should support tactical reasoning, not automate it. Analysts still need opponent context, player fitness information, and coaching intent.

The right use is to describe how the game state changed, how much prior expectations still matter, and where uncertainty is still too large to overreact.

What it can say now is when the match has become materially more stable or unstable. What it cannot yet say cleanly is whether a substitution at minute 58 is better than one at minute 68, because substitutions are not yet modelled as causal interventions.

Engineering

The implementation separates state-table construction, pre-match prior ingestion, remaining-goals modelling, and evaluation into modular pieces.

That separation matters because the in-game model is the first project where the data volume becomes meaningfully larger: minute-level states are much more demanding than match-level or shot-level summaries.

Minute-level state build
Project 2 prior integration
Reusable live evaluation utilities
Temporal split logic
Output export for later site integration

Limitations

The first-pass scoreline path is approximated from cumulative xG progression rather than exact goal timestamps, so the state table is useful but not yet a perfect reconstruction of live match history.

Substitution quality and tactical shape changes are still only partially observed in the standard event feed, which limits certainty.

Red-card states are relatively sparse, so the live model should be interpreted carefully in extreme manpower situations.

That is why this version can describe red-card impact more confidently than substitution timing: the substitution effect is not yet modelled directly.

Next Iteration

The next upgrade is to replace the approximate score-state reconstruction with true event-time score tracking from goal events.

A second upgrade is to export live probability timelines and tactical scenario charts, which would make the case study much more visually concrete.

Later iterations can add substitutions and more explicit tactical scenario analysis once the live state backbone is fully trustworthy.

Pipeline Workflow

Reuse the Project 2 La Liga archive and pre-match strength priors.
Build a minute-level match-state table from event data.
Track scoreline, cumulative xG, and red-card state through each match.
Define remaining home and away goals from each state to full time.
Fit an empirical baseline, a remaining-goals Poisson model, and a GAM-based live model.
Evaluate live probabilities on a temporal 2020/2021 test season.
Export summary artefacts for the portfolio site.

Repository Structure

modeling/project-3-live-win-probability/config/project_config.R for archive and split settings
modeling/project-3-live-win-probability/R/state_table.R for minute-level state construction
modeling/project-3-live-win-probability/R/modeling.R for empirical and remaining-goals models
modeling/project-3-live-win-probability/R/evaluation.R for multiclass live forecast metrics
modeling/project-3-live-win-probability/scripts/ for sequential state-build, fit, evaluate, and export steps
modeling/project-3-live-win-probability/outputs/ for model artefacts and summary files

What Wider Use Would Require

Low-latency state updates
Scenario engine tests
Monitoring on stale or missing event inputs
Analyst-facing interpretation guidance
Exact event-time score tracking before stronger deployment claims

In-Game Win Probability and Tactical Decision Model