DM Analytics

Question

How should current team strength be estimated for match forecasting and pre-match strategic priors?

Methods

Naive outcome baseline
Static pooled attack/defence model
Dynamic team-strength updates
Temporal train, validation, and test splits
Multiclass forecast evaluation

Data Sources

StatsBomb Open Data
La Liga match results
Aggregated match-level StatsBomb xG

Training Sample

690 matches

The first-pass model is trained on La Liga open-data seasons from 2009/2010 through 2018/2019.

Best Log Loss

0.9815

The dynamic team-strength model currently leads the first-pass forecasting ladder on test-set log loss.

Best RPS

0.2157

The dynamic model also produces the lowest ranked probability score on the 2020/2021 holdout season.

Problem

Estimate latent team strength over time so that match predictions adjust to changing form and underlying quality.

Treat the forecasting task as a probability problem over home win, draw, and away win, because the practical requirement is a credible pre-match distribution rather than a single hard pick.

Football Context

Clubs need more than static season aggregates when setting expectations, preparing matches, or feeding live probability models.

A pre-match strength model also creates a natural bridge between Project 1 and the later in-game win probability project, because live forecasting should start from a sensible prior rather than from scratch.

Data

The first pass uses a broader La Liga StatsBomb Open Data archive than Project 1, retaining seasons with at least 30 available matches from 2009/2010 through 2020/2021.

The resulting first-pass sample contains 758 matches: 690 train, 33 validation, and 35 test.

Each match record includes scoreline plus event-derived home and away xG, with non-penalty xG tracked separately to make the team-strength updates less penalty-driven.

Competition: La Liga
Train seasons: 2009/2010 to 2018/2019
Validation season: 2019/2020
Test season: 2020/2021
Important limitation: this is still an uneven open-data archive rather than a complete league feed

Model Design

The model ladder is designed to answer a clear question: does dynamic team strength actually improve forecasting over simpler alternatives?

The first-pass static model uses pooled attack and defence rates so it can handle promoted or unseen teams at prediction time, which turned out to be necessary once the initial fixed-effect approach broke on new team levels.

The dynamic model then updates attack and defence strengths recursively using match-by-match non-penalty xG performance, making it state-space-inspired without pretending to be a full latent Bayesian state-space implementation.

Model 0: naive historical outcome baseline
Model 1: static pooled attack/defence model
Model 2: dynamic recursive team-strength updates
Primary objective: calibrated pre-match outcome probabilities

Validation

Validation is strictly temporal. The model is trained on earlier seasons, tuned against a later validation season, and judged on 2020/2021 holdout matches.

The main evaluation focus is forecast probability quality rather than categorical hit rate, because the project is intended to support probabilistic decision-making.

Multiclass log loss
Multiclass Brier score
Ranked probability score
Temporal train, validation, and test split

Results

The dynamic first-pass model is already outperforming both simpler alternatives on the 2020/2021 test split.

Current test metrics are: baseline log loss 1.0500, static pooled model log loss 1.0566, dynamic model log loss 0.9815; baseline ranked probability score 0.2476, static model 0.2190, dynamic model 0.2157.

That gives the project a strong early story: dynamic updating is doing useful work here, even before the model is upgraded into a more formal state-space framework.

The season-end tables also make the model outputs tangible. In the richer years of the open-data archive, the model recovers intuitive high-end teams such as Barcelona, Real Madrid, and Atletico Madrid. In thinner years, the rankings become noisier, which is itself an important result about data quality rather than something to hide.

2015/2016 top overall teams: Barcelona, Real Madrid, Atletico Madrid
2016/2017 top overall teams: Barcelona, Real Madrid, Atletico Madrid
Defensive rankings are now interpreted correctly: higher defence is better in this model

Dynamic updating beats both simpler alternatives on the holdout season

The dynamic model is not just winning against a naive baseline. It also outperforms the repaired pooled static model, which is the more serious benchmark. That means the time-varying update step is doing real forecasting work.

This result matters more because the first static design failed earlier in development. The dynamic model is winning after the simpler comparison was made more robust.

Baseline

Static pooled

Dynamic

Log lossForecast score

Baseline1.0500

Static pooled1.0566

Dynamic0.9815

Brier scoreForecast score

Baseline0.6387

Static pooled0.5925

Dynamic0.5838

Ranked probability scoreForecast score

Baseline0.2476

Static pooled0.2190

Dynamic0.2157

Lower values are better on this comparison.

Model	Log Loss	Brier	RPS
Baseline	1.0500	0.6387	0.2476
Static pooled	1.0566	0.5925	0.2190
Dynamic	0.9815	0.5838	0.2157

What Failed During Development

The original static model was implemented as a team fixed-effect Poisson regression. That broke at prediction time because the test split included unseen team levels, which made the model brittle in exactly the way a club-facing forecasting system should avoid.

Replacing that with a pooled static attack/defence model produced a safer first-pass baseline and clarified the main modelling lesson: forecasting systems need to handle promotions, relegations, and sparse team histories gracefully.

This failure improved the project because it forced the design toward a more robust baseline and made the dynamic model comparison more meaningful.

Rejected first-pass static design: fixed-effect team Poisson model
Observed failure: unseen team levels in the test set
Replacement: pooled static attack/defence baseline
Main lesson: robustness matters more than elegant but brittle specification choices

Decision Use

The model informs forecasting and provides priors for the live win-probability system, but should not be used as a substitute for squad-level context.

Analysts should treat the outputs as structured priors, then layer in injuries, tactical mismatches, schedule congestion, and line-up information before taking action.

The most useful practical output is not a final league table. It is a season-aware strength prior that can be reused in later models and checked against football common sense.

Engineering

Runs are reproducible over fixed temporal windows, using a canonical match-feature table and a clear model ladder rather than one-off notebook experimentation.

The first implementation also makes a useful engineering point: once promoted teams broke the initial static model, the safer pooled baseline became part of the permanent workflow rather than an ad hoc patch.

Canonical match-table build step
Event-derived xG aggregation
Temporal split logic in config
Tested evaluation utilities
Reusable forecast summary export

Limitations

The main data limitation is still open-data coverage. Although the archive is broader than Project 1, it is not a complete proprietary historical feed, and the number of matches varies sharply by season.

Structural team changes can outpace the update speed of a dynamic recursive model, especially when recent evidence is sparse or when squad turnover is extreme.

The current dynamic model is state-space-inspired rather than a full latent state-space implementation, so uncertainty around latent strength is not yet being modelled as formally as it could be.

Next Iteration

The next upgrade is to replace the recursive dynamic update rule with a more formal latent state-space or smoothing framework.

A second upgrade is to compare StatsBomb native xG inputs against custom Project 1 xG aggregates, which would make the portfolio connection between Projects 1 and 2 much stronger.

Further work should also export team-strength trajectories and forecast comparison charts to the website, so the project is as visually concrete as Project 1.

Pipeline Workflow

Read La Liga competitions and retain seasons with at least 30 open-data matches.
Build a canonical match table from StatsBomb match files.
Aggregate event-level StatsBomb xG to home and away match features.
Split the archive into train, validation, and test windows by season.
Fit a naive baseline, a static pooled attack/defence model, and a dynamic team-strength model.
Evaluate multiclass outcome probabilities using log loss, Brier score, and ranked probability score.
Export summary artefacts for the portfolio site.

Repository Structure

modeling/project-2-team-strength/config/project_config.R for season windows and data paths
modeling/project-2-team-strength/R/match_features.R for canonical match-table and xG aggregation logic
modeling/project-2-team-strength/R/modeling.R for baseline, static, and dynamic forecasting models
modeling/project-2-team-strength/R/evaluation.R for multiclass forecast metrics
modeling/project-2-team-strength/scripts/ for sequential build, fit, evaluate, and export steps
modeling/project-2-team-strength/outputs/ for model artefacts and summary files

What Wider Use Would Require

Automated rolling retraining
Backtest reports
Schema validation
Forecast calibration monitoring
Explicit handling of promoted and relegated teams

Dynamic Team Strength and Match Forecasting Model