Question

How should a club compare recruitment targets within role while controlling for team environment and small-sample noise?

Methods

  • Role-aware component ratings
  • GAM-based team-context adjustment
  • Reliability shrinkage
  • Season-to-season stability analysis

Data Sources

  • StatsBomb Open Data lineups
  • StatsBomb event data
  • Project 2 team non-penalty xG context

Rated Player-Seasons

288

The first pass rates 288 La Liga player-seasons from 2015/2016 to 2020/2021 across the four target roles.

Strongest Stability Signal

0.50

Central midfielders repeat at a 0.50 season-to-season correlation across 15 repeated player-seasons.

Largest Context Gap

1.32

Luis Suarez's 2015/2016 raw center-forward score dropped by 1.32 after adjusting for Barcelona's team environment.

Problem

Build a player rating that is useful for recruitment and development decisions, rather than a generic public-facing score.

The core challenge is separating player contribution from role, team environment, and unreliable small-sample output.

Football Context

Recruitment models fail when they compare players across fundamentally different tactical jobs or reward outputs inflated by dominant teams.

A club analyst needs a shortlist tool that narrows discussion safely before live scouting, video review, medical checks, and financial screening.

Data

The first pass uses StatsBomb Open Data for La Liga from 2015/2016 to 2020/2021. Player minutes are derived from lineup stints, not assumed from appearance counts.

Event files provide shots, xG, passing, carrying, dribbling, and defensive actions. Project 2 contributes season-level team non-penalty xG context for adjustment.

  • 288 rated player-seasons after a 600-minute threshold
  • Four broad roles: center forward, winger, central midfielder, center back
  • Open-data coverage is uneven, with 2015/2016 providing much of the usable depth

Model Design

The model standardises features within role, builds raw attacking, creation, and defensive component scores, and combines them into a transparent raw overall rating.

A role-specific GAM then estimates how much of that raw score is explained by team non-penalty xG difference. The final rating is the context-adjusted score, shrunk for low-minute reliability.

  • Why this model and not the obvious alternative: a transparent role-aware score is easier to defend than a black-box 'true talent' model on sparse open data
  • Why a GAM: it lets team-context adjustment be flexible without pretending the relationship is perfectly linear
  • Why not a universal score: cross-role comparisons are statistically weak and football-incorrect

Validation

Validation focuses on season-to-season stability for repeated players, within-role coherence, and the size of context adjustments for players from dominant teams.

The current stability signal is directionally useful rather than definitive: central midfielders and center backs both show roughly 0.5 repeat correlation, while wingers remain too sparse to overclaim.

  • Central Midfielder stability: 0.499 across 15 repeated player-seasons
  • Center Back stability: 0.485 across 12 repeated player-seasons
  • Center Forward stability: 0.443 across 6 repeated player-seasons

Results

The first pass produces role-specific tables that are already useful for recruitment framing. The output highlights strong same-role performers while exposing how much elite-team context can flatter raw box-score production.

Barcelona attackers produce the clearest adjustment examples: Lionel Messi and Luis Suarez still rate highly, but their raw numbers are materially reduced once team context is accounted for.

  • Top center backs in the first pass include Shkodran Mustafi, Gustavo Cabral, and Gerard Pique
  • The model exports explicit context-gap examples so analysts can see where raw output is most inflated
  • Low-minute players are retained only with shrinkage, not treated as equally reliable to full-season starters
Project 4 role distribution and rating spread chart

Role Coverage and Rating Spread

The first-pass recruitment model covers 288 player-seasons across four roles. Center backs and central midfielders currently have the deepest samples, which matters when interpreting stability.

Project 4 season-to-season stability chart

Season-to-Season Stability

Repeated-player stability is strongest in the deeper role samples. Winger stability appears very high, but only three repeated player-seasons exist, so that result should be treated cautiously.

Project 4 team-context adjustment chart

Largest Team-Context Adjustments

This chart shows how far raw ratings moved after adjusting for team environment. Barcelona attackers remain elite, but the model makes their strong-team inflation explicit instead of hiding it.

What Failed During Development

The original scaffold estimated minutes as appearances multiplied by 90. That was fast, but not defensible, so it was replaced with lineup-derived stint minutes.

A recent-seasons-only build produced too few player-seasons to make the recruitment model credible. The scope was widened back to 2015/2016 onward to recover sample depth.

Flexible context adjustment also needed a fallback path for thin role samples where a smooth would overfit or fail altogether.

Decision Use

The rating is designed to narrow a shortlist, structure same-role comparisons, and flag players whose outputs deserve more video review.

It should not be used as a single signing score. Analysts still need tactical fit, athletic profile, contract situation, injury history, and live scouting context before acting.

Engineering

The pipeline keeps raw StatsBomb files immutable, constructs canonical player-season tables, and exports reproducible tables for role summaries, top players, and context-gap diagnostics.

The implementation shares the same R, data.table, mgcv, and testthat pattern as the rest of the portfolio, which makes the research stack coherent rather than one-off.

Limitations

Open-data coverage is uneven, role assignment is still broad, and team-context adjustment is built from one season-level strength proxy rather than a full player-level hierarchical model.

Uncertainty is largest for wingers with sparse repeated seasons, low-minute players near the threshold, and anyone whose role changed materially across seasons.

Next Iteration

Add age curves, league-translation logic, and better possession/value features so the model becomes more useful for cross-context recruitment work.

A strong next public-facing case study would be a World Cup or transfer-shortlist module that uses this rating system rather than replacing it.

Pipeline Workflow

  1. Read the La Liga match index and keep the raw StatsBomb files immutable.
  2. Build a player-match layer from lineup stint minutes plus event-derived player actions.
  3. Aggregate player-match rows into canonical player-season tables within four broad roles.
  4. Engineer per-90 and possession-adjusted defensive rates, then merge season-specific team context from Project 2.
  5. Create raw attacking, creation, and defensive component scores within role.
  6. Adjust the raw overall rating for team non-penalty xG context and shrink low-minute outputs for reliability.
  7. Export role summaries, top-player tables, context-gap examples, and clear project notes.

Repository Structure

  • config/project_config.R: scope, season window, role list, and inclusion thresholds.
  • R/player_table.R: lineup-minute extraction, role mapping, player-match aggregation, and season table construction.
  • R/modeling.R: role-specific component ratings, GAM context adjustment, and reliability shrinkage.
  • R/evaluation.R: role summaries, top-player tables, stability checks, and context-gap examples.
  • scripts/01-04: sequential build, fit, evaluate, and export steps.

What Wider Use Would Require

  • Automated role assignment rules and manual review for edge cases
  • Data quality checks for lineup stints, duplicate identities, and sparse player histories
  • Monitoring for unstable low-minute ratings and major context-shift players
  • Report generation that combines ratings with scouting, medical, and contractual context