The Datafication of Football
I have to be honest at this time of the year. Now that most European football leagues are about to close as they have found their winners and Champions League final is close, I must admit that summer will be empty somehow as I am a football fan. Being realistic at the same time, it is hard to think too idealistically about football clubs, players and their fans. Football has become a billion-euro business. Renowned clubs nowadays have to think in terms of a globalized market. Regardless of that, I still have the impression that football is still frequently associated with concepts like intuition (of coaches and players) and talent (particularly in the case of players). Well, I have the impression that there is still a lot of it in the game. Both are definitely unmeasurable and will keep the sport as fascinating as it is. However, a revolution that has taken place over the last years that should not be overlooked. This revolution is about using data.
The American author Michael Lewis wrote the book Moneyball that told the story of Billy Beane, the general manager of the mediocre baseball team Oakland A’s. The team started using rigorous statistical performance measures to evaluate players which gave them a competitive edge also against the richer clubs in the National Baseball League. For the ones interested in the story, the movie Moneyball from 2011 starring Brad Pitt can be recommended.
Data Science has arrived in modern football. Manchester City for example employs 11 data scientists according to a Guardian article from 2014. Liverpool FC introduced the position Director of Research some years ago, it is currently held by a theoretical physicist who earned his PhD at Cambridge university. Also, training has been “datafied” at many clubs such as TSG Hoffenheim who secured a fourth-place finish in German Bundesliga this season and will therefore play qualifying matches for the Champions League. The amount of data collected (often through wearable devices) and analysed in real-time is huge. It is some seven million data points that are derived from a 10-minute session with 10 players and three balls. The data is accessible in real time and often used in player feedback to help them work on their weaknesses. TSG Hoffenheim closely works with Software giant SAP which is not a really big surprise as the company is the club’s main sponsor.
What is surprising that data and football – particularly in Britain – have a rather long history together. It was in 1950 when Charles Reep, a veteran from World War II and trained accountant got so frustrated with one performance of his club Swindon Town that he started collecting data about the match in the second half. He concluded from the data that it would be reasonable to move the ball toward the goal with as few passes as possible to increase the chance to score more often and thereby win the match. Reeps ideas had strong influence in British football and coined what was long known as British “kick and rush” way to play the game. Today methods are more advanced and the amount of possible data to be analysed is immense. However, the underlying ideas and aims of number crunching in the context of football have not changed a lot. Take a look at this paper published in connection with this year’s Sports Analytics Conference hosted by the MIT Sloan School of Management. That´s counting a lot of passes from a lot of players in a highly advanced manner!
If you are a data scientist interested in football and want to play around a bit (maybe in the football-free-zone over summer), you might wish to get inspired by guys like Martin Eastwood and his talk Predicting football using r. Everything you need to re-do the example can be found in this Git Hub repository. Martin also has a blog on football analytics that is worth reading, it is called Pena.lt/y. If you want to dive deeper (what a wordplay in a football-related blog post 😊) there is for instance a whole Sport Analytics package in R which is a selection of data sets, functions to fetch sports data, examples, and demos. Not to mention the countless possibilities of web scraping in R towards online football databases.
So, let’s keep the ball rolling. I wish you a nice start into the summer.