Training Sample
690 matches
The first-pass model is trained on La Liga open-data seasons from 2009/2010 through 2018/2019.
Forecasting Layer
A La Liga match forecasting case study using a broader historical open-data sample, with a baseline-to-dynamic model ladder and explicit evaluation of whether time-varying team strength improves pre-match probabilities.
How should current team strength be estimated for match forecasting and pre-match strategic priors?
Training Sample
690 matches
The first-pass model is trained on La Liga open-data seasons from 2009/2010 through 2018/2019.
Best Log Loss
0.9815
The dynamic team-strength model currently leads the first-pass forecasting ladder on test-set log loss.
Best RPS
0.2157
The dynamic model also produces the lowest ranked probability score on the 2020/2021 holdout season.
Estimate latent team strength over time so that match predictions adjust to changing form and underlying quality.
Treat the forecasting task as a probability problem over home win, draw, and away win, because the practical requirement is a credible pre-match distribution rather than a single hard pick.
Clubs need more than static season aggregates when setting expectations, preparing matches, or feeding live probability models.
A pre-match strength model also creates a natural bridge between Project 1 and the later in-game win probability project, because live forecasting should start from a sensible prior rather than from scratch.
The first pass uses a broader La Liga StatsBomb Open Data archive than Project 1, retaining seasons with at least 30 available matches from 2009/2010 through 2020/2021.
The resulting first-pass sample contains 758 matches: 690 train, 33 validation, and 35 test.
Each match record includes scoreline plus event-derived home and away xG, with non-penalty xG tracked separately to make the team-strength updates less penalty-driven.
The model ladder is designed to answer a clear question: does dynamic team strength actually improve forecasting over simpler alternatives?
The first-pass static model uses pooled attack and defence rates so it can handle promoted or unseen teams at prediction time, which turned out to be necessary once the initial fixed-effect approach broke on new team levels.
The dynamic model then updates attack and defence strengths recursively using match-by-match non-penalty xG performance, making it state-space-inspired without pretending to be a full latent Bayesian state-space implementation.
Validation is strictly temporal. The model is trained on earlier seasons, tuned against a later validation season, and judged on 2020/2021 holdout matches.
The main evaluation focus is forecast probability quality rather than categorical hit rate, because the project is intended to support probabilistic decision-making.
The dynamic first-pass model is already outperforming both simpler alternatives on the 2020/2021 test split.
Current test metrics are: baseline log loss 1.0500, static pooled model log loss 1.0566, dynamic model log loss 0.9815; baseline ranked probability score 0.2476, static model 0.2190, dynamic model 0.2157.
That gives the project a strong early story: dynamic updating is doing useful work here, even before the model is upgraded into a more formal state-space framework.
The season-end tables also make the model outputs tangible. In the richer years of the open-data archive, the model recovers intuitive high-end teams such as Barcelona, Real Madrid, and Atletico Madrid. In thinner years, the rankings become noisier, which is itself an important result about data quality rather than something to hide.

The dynamic team-strength model beats both simpler alternatives on all three first-pass forecast metrics, giving the project a strong empirical case for time-varying modelling.

The recursive strength updates make the model visually interpretable: team attack and defence ratings evolve over time rather than staying fixed at static season averages.

This view makes the learned ratings easier to inspect: Barcelona dominate the 2015/2016 and 2016/2017 snapshots, while later seasons look noisier because open-data coverage becomes thinner.
The original static model was implemented as a team fixed-effect Poisson regression. That broke at prediction time because the test split included unseen team levels, which made the model brittle in exactly the way a club-facing forecasting system should avoid.
Replacing that with a pooled static attack/defence model produced a safer first-pass baseline and clarified the main modelling lesson: forecasting systems need to handle promotions, relegations, and sparse team histories gracefully.
This failure improved the project because it forced the design toward a more robust baseline and made the dynamic model comparison more meaningful.
The model informs forecasting and provides priors for the live win-probability system, but should not be used as a substitute for squad-level context.
Analysts should treat the outputs as structured priors, then layer in injuries, tactical mismatches, schedule congestion, and line-up information before taking action.
The most useful practical output is not a final league table. It is a season-aware strength prior that can be reused in later models and checked against football common sense.
Runs are reproducible over fixed temporal windows, using a canonical match-feature table and a clear model ladder rather than one-off notebook experimentation.
The first implementation also makes a useful engineering point: once promoted teams broke the initial static model, the safer pooled baseline became part of the permanent workflow rather than an ad hoc patch.
The main data limitation is still open-data coverage. Although the archive is broader than Project 1, it is not a complete proprietary historical feed, and the number of matches varies sharply by season.
Structural team changes can outpace the update speed of a dynamic recursive model, especially when recent evidence is sparse or when squad turnover is extreme.
The current dynamic model is state-space-inspired rather than a full latent state-space implementation, so uncertainty around latent strength is not yet being modelled as formally as it could be.
The next upgrade is to replace the recursive dynamic update rule with a more formal latent state-space or smoothing framework.
A second upgrade is to compare StatsBomb native xG inputs against custom Project 1 xG aggregates, which would make the portfolio connection between Projects 1 and 2 much stronger.
Further work should also export team-strength trajectories and forecast comparison charts to the website, so the project is as visually concrete as Project 1.