Question

How should current team strength be estimated for match forecasting and pre-match strategic priors?

Methods

  • Naive outcome baseline
  • Static pooled attack/defence model
  • Dynamic team-strength updates
  • Temporal train, validation, and test splits
  • Multiclass forecast evaluation

Data Sources

  • StatsBomb Open Data
  • La Liga match results
  • Aggregated match-level StatsBomb xG

Training Sample

690 matches

The first-pass model is trained on La Liga open-data seasons from 2009/2010 through 2018/2019.

Best Log Loss

0.9815

The dynamic team-strength model currently leads the first-pass forecasting ladder on test-set log loss.

Best RPS

0.2157

The dynamic model also produces the lowest ranked probability score on the 2020/2021 holdout season.

Problem

Estimate latent team strength over time so that match predictions adjust to changing form and underlying quality.

Treat the forecasting task as a probability problem over home win, draw, and away win, because the practical requirement is a credible pre-match distribution rather than a single hard pick.

Football Context

Clubs need more than static season aggregates when setting expectations, preparing matches, or feeding live probability models.

A pre-match strength model also creates a natural bridge between Project 1 and the later in-game win probability project, because live forecasting should start from a sensible prior rather than from scratch.

Data

The first pass uses a broader La Liga StatsBomb Open Data archive than Project 1, retaining seasons with at least 30 available matches from 2009/2010 through 2020/2021.

The resulting first-pass sample contains 758 matches: 690 train, 33 validation, and 35 test.

Each match record includes scoreline plus event-derived home and away xG, with non-penalty xG tracked separately to make the team-strength updates less penalty-driven.

  • Competition: La Liga
  • Train seasons: 2009/2010 to 2018/2019
  • Validation season: 2019/2020
  • Test season: 2020/2021
  • Important limitation: this is still an uneven open-data archive rather than a complete league feed

Model Design

The model ladder is designed to answer a clear question: does dynamic team strength actually improve forecasting over simpler alternatives?

The first-pass static model uses pooled attack and defence rates so it can handle promoted or unseen teams at prediction time, which turned out to be necessary once the initial fixed-effect approach broke on new team levels.

The dynamic model then updates attack and defence strengths recursively using match-by-match non-penalty xG performance, making it state-space-inspired without pretending to be a full latent Bayesian state-space implementation.

  • Model 0: naive historical outcome baseline
  • Model 1: static pooled attack/defence model
  • Model 2: dynamic recursive team-strength updates
  • Primary objective: calibrated pre-match outcome probabilities

Validation

Validation is strictly temporal. The model is trained on earlier seasons, tuned against a later validation season, and judged on 2020/2021 holdout matches.

The main evaluation focus is forecast probability quality rather than categorical hit rate, because the project is intended to support probabilistic decision-making.

  • Multiclass log loss
  • Multiclass Brier score
  • Ranked probability score
  • Temporal train, validation, and test split

Results

The dynamic first-pass model is already outperforming both simpler alternatives on the 2020/2021 test split.

Current test metrics are: baseline log loss 1.0500, static pooled model log loss 1.0566, dynamic model log loss 0.9815; baseline ranked probability score 0.2476, static model 0.2190, dynamic model 0.2157.

That gives the project a strong early story: dynamic updating is doing useful work here, even before the model is upgraded into a more formal state-space framework.

The season-end tables also make the model outputs tangible. In the richer years of the open-data archive, the model recovers intuitive high-end teams such as Barcelona, Real Madrid, and Atletico Madrid. In thinner years, the rankings become noisier, which is itself an important result about data quality rather than something to hide.

  • 2015/2016 top overall teams: Barcelona, Real Madrid, Atletico Madrid
  • 2016/2017 top overall teams: Barcelona, Real Madrid, Atletico Madrid
  • Defensive rankings are now interpreted correctly: higher defence is better in this model
Bar charts comparing log loss, Brier score, and ranked probability score across baseline, static pooled, and dynamic forecasting models.

Forecast Metric Comparison

The dynamic team-strength model beats both simpler alternatives on all three first-pass forecast metrics, giving the project a strong empirical case for time-varying modelling.

Line charts showing dynamic attack and defence strength trajectories over time for selected La Liga teams.

Team-Strength Trajectories

The recursive strength updates make the model visually interpretable: team attack and defence ratings evolve over time rather than staying fixed at static season averages.

Season-end summary chart showing overall and defensive team strength for leading teams in recent seasons.

Season-End Strength Snapshot

This view makes the learned ratings easier to inspect: Barcelona dominate the 2015/2016 and 2016/2017 snapshots, while later seasons look noisier because open-data coverage becomes thinner.

What Failed During Development

The original static model was implemented as a team fixed-effect Poisson regression. That broke at prediction time because the test split included unseen team levels, which made the model brittle in exactly the way a club-facing forecasting system should avoid.

Replacing that with a pooled static attack/defence model produced a safer first-pass baseline and clarified the main modelling lesson: forecasting systems need to handle promotions, relegations, and sparse team histories gracefully.

This failure improved the project because it forced the design toward a more robust baseline and made the dynamic model comparison more meaningful.

  • Rejected first-pass static design: fixed-effect team Poisson model
  • Observed failure: unseen team levels in the test set
  • Replacement: pooled static attack/defence baseline
  • Main lesson: robustness matters more than elegant but brittle specification choices

Decision Use

The model informs forecasting and provides priors for the live win-probability system, but should not be used as a substitute for squad-level context.

Analysts should treat the outputs as structured priors, then layer in injuries, tactical mismatches, schedule congestion, and line-up information before taking action.

The most useful practical output is not a final league table. It is a season-aware strength prior that can be reused in later models and checked against football common sense.

Engineering

Runs are reproducible over fixed temporal windows, using a canonical match-feature table and a clear model ladder rather than one-off notebook experimentation.

The first implementation also makes a useful engineering point: once promoted teams broke the initial static model, the safer pooled baseline became part of the permanent workflow rather than an ad hoc patch.

  • Canonical match-table build step
  • Event-derived xG aggregation
  • Temporal split logic in config
  • Tested evaluation utilities
  • Reusable forecast summary export

Limitations

The main data limitation is still open-data coverage. Although the archive is broader than Project 1, it is not a complete proprietary historical feed, and the number of matches varies sharply by season.

Structural team changes can outpace the update speed of a dynamic recursive model, especially when recent evidence is sparse or when squad turnover is extreme.

The current dynamic model is state-space-inspired rather than a full latent state-space implementation, so uncertainty around latent strength is not yet being modelled as formally as it could be.

Next Iteration

The next upgrade is to replace the recursive dynamic update rule with a more formal latent state-space or smoothing framework.

A second upgrade is to compare StatsBomb native xG inputs against custom Project 1 xG aggregates, which would make the portfolio connection between Projects 1 and 2 much stronger.

Further work should also export team-strength trajectories and forecast comparison charts to the website, so the project is as visually concrete as Project 1.

Pipeline Workflow

  1. Read La Liga competitions and retain seasons with at least 30 open-data matches.
  2. Build a canonical match table from StatsBomb match files.
  3. Aggregate event-level StatsBomb xG to home and away match features.
  4. Split the archive into train, validation, and test windows by season.
  5. Fit a naive baseline, a static pooled attack/defence model, and a dynamic team-strength model.
  6. Evaluate multiclass outcome probabilities using log loss, Brier score, and ranked probability score.
  7. Export summary artefacts for the portfolio site.

Repository Structure

  • modeling/project-2-team-strength/config/project_config.R for season windows and data paths
  • modeling/project-2-team-strength/R/match_features.R for canonical match-table and xG aggregation logic
  • modeling/project-2-team-strength/R/modeling.R for baseline, static, and dynamic forecasting models
  • modeling/project-2-team-strength/R/evaluation.R for multiclass forecast metrics
  • modeling/project-2-team-strength/scripts/ for sequential build, fit, evaluate, and export steps
  • modeling/project-2-team-strength/outputs/ for model artefacts and summary files

What Wider Use Would Require

  • Automated rolling retraining
  • Backtest reports
  • Schema validation
  • Forecast calibration monitoring
  • Explicit handling of promoted and relegated teams