Recently, Neymar Jr moved from Barcelona to Paris St. Germain for an eye-watering £199 m. To put this in perspective, the team that won the League 1 last season, Monaco, is worth less. Was he worth the money and is there an elegant way to compare ability across players so as to predict their worth?
While they both are undoubtedly excellent players – one is a potential Ballon D’or winner, the other a uniquely talented midfielder – they are also extremely visible off the pitch as well. Neymar has over 30 million followers on Twitter, while Pogba became the first footballer to have an emoji made after him! They aren’t exceptions in this regard. Excellent players tend to have a strong following online as fans are able to follow their actions off the field as well.
The market value of a player should, in theory, be determined by 3 things – his age, the position he plays in, and his ability. Herein lies the problem – football not only has a variety of positions for players that require unique skill sets but being a dynamic team sport, it is very hard to measure good performances. To illustrate, consider the case of a defender – if he tackles more than a fellow defender, he could:
- Be a better defender, since he gets more tackles in
- Be a worse defender, because his poor positioning forces him to make more tackles
- Simply play in a poorer team, which means his side are prone to be attacked more
Clearly, the statistics we currently use to measure a footballer’s performance are extremely inadequate for any sort of comparative analysis, and this holds true across all positions. To be fair, there are efforts ongoing to build a more meaningful set of statistics, like the Expected Goals (xG) statistic by Opta. However, these methods are still in their infancy and are not easily available anyway. The only scientific way of calculating the market value of a player would be to get a suitable substitute for
For those not familiar with Fantasy Premier League (FPL), it is an online competition where people choose a team, from all the players available in the competition, with a fixed budget of £100 million. The eventual aim is to maximise the points earned over every game week. FPL itself provides a value for every player, such that it costs more to be a better player. In a way, this valuation serves as a useful proxy for ability. The problem with FPL valuation, though, is that it is capped at a minimum of £4 million.
Additionally, the FPL valuation may, in fact, be generated by some algorithm used by FPL – they do not disclose their methodology. This is problematic because it becomes difficult to understand how they, FPL, arrived at the particular number and adding it to a model doesn’t make sense. It is much more preferable to use an organic measure, one which is not influenced by any calculations but is simply an observation. Excellent players tend to have a strong following online as well – could this be true across ability levels? Are better players more popular than worse ones? Would we see a positive correlation between ability (by proxy, FPL valuation) and popularity?
There are a variety of metrics of popularity, but I’ve decided to use a fairly simple yet intuitive one – Wikipedia page hits over the past year. I chose Wikipedia views for the following reasons –
- Better than Twitter/Facebook since it’s not dependent on whether the player has a profile or not.
- Better than Facebook/Instagram followers since those are subject to how engaging the players’ posts are, as well.
- Was easy to get
forthe time frame required – I wanted to exclude May–July, since it would inflate the popularity of players linked with a transfer in 2016/17.
The graph above shows a nice, linear relationship between page views and FPL valuation. Let that sink in – in effect, we are saying we can compare the ability of players based on how many times they’ve been searched on Wikipedia over the past year! While this might seem a bit odd, the data seems to show consistency with this idea.
This isn’t to say Wikipedia page views is the perfect measure. It has its own problems of correlation with other factors –
- Players from England itself may get more
hits,since they’re playing in their home league i.e. nationality of the player may matter.
- Different categories of players get different levels of attention – forwards are definitely much more popular than defenders.
- New signings may get more attention, even beyond the transfer season.
- The top clubs have a much larger international audience.
- Breakout players may get a surge of hits since they were virtually unknown before that. Think Marcus Rashford in 2016/17.
- Players with long-term injuries may have far fewer hits, simply because they haven’t been playing.
Now that we have all 3 attributes required to (in theory) be able to arrive at a player’s market value, I built a linear regression model to see the fit. The model looks like –
market value ~ ability + position*age
Here, * means an interaction term. Since market value definitely doesn’t grow linearly with age (it peaks at around 27-29), I converted it into a categorical variable with 5 categories. For position, I made the following categories – attack, midfield,
For factors 1 – 4:
- Retrieved the nationality of each player, and put them into 4 buckets:
- 1 for England
- 2 for EU (Brexit made this a natural classification)
- 3 for Americas
- 4 for Rest of World
A new column called region was made, as a factor with 4 levels.
- Included an interaction term for page views and position category. (Good defenders are less popular than good strikers!)
- Marked the new signings of 2016/17, and interacted that with page views.
- A column big_club was created comprising of United, City, Chelsea, Arsenal, Liverpool, and Tottenham. This
was interactedwith page views as well.
Finally, the model looks something like this –
Market value ~ page_views + attack&Age17-21 + midfield&Age17-21 + defence&Age17-21 + GK&Age17-21 (and so on for all age categories) + page_views&big_club + page_views®ionEU (and all such regions)
The newly promoted clubs, as well as new foreign
The model had an explanatory power of 72%! This is promising because the model is able to explain a fair share of the difference in market value between players. The coefficient of popularity is statistically significant at less than 1% level of significance, and the coefficients make intuitive sense.
- The coefficient of big_club*page_views is statistically significant, confirming the hypothesis of these clubs receiving much more attention than other clubs.
- Attackers and midfielders peak in the age category 26-28, whereas defenders peak between 29-31. (This is a dangerous conclusion to reach, however, since this could be caused by Premier League clubs preferring to keep younger attackers. The ideal way to reach this conclusion would be to analyse a time series panel of player’s market value).
- Wayne Rooney is the most undervalued player. While this may not seem true based on his performances over the past year, it makes sense because it is very hard to understand how a 31 year old forward, who is England’s most popular player (him and Pogba are almost double of anyone else), is only worth £15 million.
The next time you’re having a debate about who the better player is, it might make sense to simply check how popular each of them are! If the world is watching a player, chances are he’s pretty good.
For access to the complete dataset, or a more rigorous look at the analysis, click here.