Prediction of the FIFA World Cup 2018 – A random forest approach with an emphasis on estimated team ability parameters – by Andreas Groll, Christophe Ley, Gunther Schauberger, Hans Van Eetvelde, published on June 8, 2018 – The authors compare three different modeling approaches for the scoresof soccer matches with regard to their predictive performances based on all matches from the four previous FIFA World Cups 2002 – 2014: 1) Poisson regression models, 2) random forests and 3) ranking methods.
Predictive bookmaker consensus model for the UEFA Euro 2016 – this working paper authored by Achim Zeileis, Christoph Leitner and Kurt Hornik (in: Working Papers in Economics and Statistics, No. 2016-15 – provided in cooperation with the Institute of Public Finance, University of Innsbruck, Austria)
The authors employ a predictive model based on bookmaker odds from 19 online bookmakers, in order to forecast the winning probability of each team, furthermore, by complementing the bookmaker consensus results with simulations of the whole tournament, predicted pairwise probabilities for each possible game at the Euro 2016 are obtained along with “survival” probabilities for each team proceeding to the different stages of the tournament.
Also available for free download here: https://www2.uibk.ac.at/downloads/c4041030/wpaper/2016-15.pdf
See an earlier publication of the same authors: Bookmaker Consensus and Agreement for theUEFA Champions League 2008/09
Modelling the Results of Sports Events – an insightful paper by James Gardner, University of Leeds, Department of Statistics (Project Supervisor: Prof. Jochen Voss), 2011.
Win probability graphs for all 2013/2014 NHL regular season games – by sports analytics expert Stephen Pettigrew (Harvard University, 2014)
Modelling Association Football Scores and Inefficiencies in the Football Betting Market – by Mark J. Dixon and Stuart G. Coles (© 1997 Royal Statistical Society). A parametric model is developed and fitted to English league and cup footbal data from 1992 to 1995. The model is motivated by an aim to exploit potential inefficiencies in the association football betting market, and this is examined using bookmakers’ odds from 1995 to 1996. The technique is based on a Poisson regression model but is complicated by the data structure and the dynamic nature of teams’ performances. Maximum likelihood estimates are shown to be computationally obtainable, and the model is shown to have a positive return when used as the basis of a betting strategy.
Evaluating the Predictive Accuracy of Association Football Forecasting Systems – by A. Constantinou and N.E. Fenton, University of London, UK.
Despite the increasing importance and popularity of association football forecasting systems there is no
agreed method of evaluating their accuracy. The authors of this paper have classified the evaluators used into two broad categories: those which consider only the prediction for the observed outcome; and those which consider the predictions for the unobserved as well as observed outcome. They highlight fundamental inconsistencies between them and demonstrate that they produce wildly different conclusions about the accuracy of four different forecasting
systems (Fink Tank/Castrol Predictor, Bet365, Odds Wizard, and pi-football) based on recent Premier League
data. None of the existing evaluators satisfy a set of simple theoretical benchmark criteria. Hence, it is dangerous
to assume that any existing evaluator can adequately assess the performance of football forecasting systems
and, until evaluators are developed that address all the benchmark criteria, it is best to use multiple types of
predictive evaluators (preferably based on posterior validation).
Modelling football match results and the efficiency of fixed-odds betting – by John Goddard (University of Wales Swansea) and Ioannis Asimakopoulos (University of Wales Bangor). An ordered probit regression model estimated using 15 years’ data is used to model English league football match results. As well as past match results data, the significance of the match for end-of season league outcomes; the involvement of the teams in cup competition; the geographical distance
between the two teams’ home towns; and the average attendances of the two teams all contribute to the model’s performance. The model is used to test the weak-form efficiency of prices in the fixed odds betting market, and betting strategies with a positive expected return are identified.
Relational Learning for Football-Related Predictions – by Jan Van Haaren and Guy Van den Broeck, Katholieke Universiteit Leuven, Belgium
Association football has recently seen some radical changes, leading to higher financial stakes, further professionalization and technical advances. This gave rise to large amounts of data becoming available for analysis. Therefore, we propose football-related predictions as an interesting application for relational learning. The authors argue that football data is highly structured and most naturally represented in a relational way. Furthermore, they identify interesting learning tasks which require a relational approach, such as link prediction or structured output learning. Early experiments show that this relational approach is competitive with a propositionalized approach for the prediction of individual football matches’ goal difference.
Predicting Margin of Victory in NFL Games: Machine Learning vs. the Las Vegas Line – by Jim Warner (2010)
In this study, the authors describe efforts to use machine learning to out-perform the expert Las Vegas line-makers at predicting the outcome of NFL football games. The statistical model they employ for inference is the Gaussian process, a powerful tool for supervised learning applications. With predictions for the margin of victory and associated confidence intervals from the Gaussian process model, they propose a simple framework which recommends a bet on a given game when it is deemed statistically favorable. The training dataset we consider in this study includes a wide variety of offensive and defensive NFL statistics from about 2000 games between 2000 and 2009. We also explore the impact of including additional novel features previously unstudied: the temperature difference between competing team’s cities and a team’s computed strength according to J.P. Keener. They show that their predictions for margin of victory result in an error just 2% higher than that of the Las Vegas line and that we can successfully pick the game winner over 64% of the time. The bet-recommendation scheme they propose is shown to provide a win rate just under 51% but falls short of the mark of 52.4% needed to break even in the NFL gambling system.
The predictive power of ranking systems in association football – by Jan Lasek, Zoltan Szlavik and Sandjai Bhulai, Vrije Universiteit, Amsterdam, Netherlands
The authors provide an overview and comparison of predictive capabilities of several methods for ranking association football teams. The main benchmark used is the official FIFA ranking for national teams. The ranking points of teams are turned into predictions that are next evaluated based on their accuracy. This enables us to determine which ranking method is more accurate. The best performing algorithm is a version of the famous Elo rating system that originates from chess player ratings, but several other methods (and method versions) provide better predictive performance than the official ranking method. Being able to predict match outcomes better than the official method might have implications for, e.g., a team’s strategy to schedule friendly games.
An introduction to football modelling at Smartodds – Oxford SIAM Conference 2011 by Robert Johnson
Football Result Prediction with Bayesian Network in Spanish League-Barcelona Team – the authors look at the performance of a Bayesian Network in the area of predicting the result of football matches involving Barcelona FC during 2008-2009.
A New Application of Linear Modeling in the Prediction of College Football Bowl Outcomes and the Development of Team Ratings by Brady T. West and Madhur Lamsal; Building on the quantitative literature dedicated to the development of ratings for college and professional football teams, the paper presents a straightforward application of linear modeling in the development of a predictive model for the outcomes of college football bowl games, and identifies important team-level predictors of actual bowl outcomes in 2007-2008 using real Football Bowl Subdivision (FBS) data from the recently completed 2004-2006 college football seasons.
Analysis and Prediction of Football Statistics using Data Mining Techniques – by Anurag Gangal, Abhishek Talnikar, Aneesh Dalvi, Vidya Zope and Aadesh Kulkarni from VESIT, Mumbai, India.
Rating Systems for Fixed Odds Football Match Prediction – a white paper published by football-data.co.uk
Predicting Outcomes of Association Football Matches Based on Individual Players’ Performance – an MSc thesis in Computer Science at the Norwegian University of Science and Technology, by Johanne Birgitte Linde and Marius Løkketangen, 2014.
Using Twitter to predict football outcomes – by Stylianos Kampakis and Andreas Adamides, University College London
A Compound Approach for Football Result Prediction – presenting FRES (Football Result Expectation System), consisting of two major components: a rule-based reasoner and a Bayesian network component; written by Byungho Min, Chongyoun Choe, and R. I. (Bob) McKay, School of Computer Science and Engineering, Seoul National University, Seoul, Korea
Beating the bookie: A look at statistical models for prediction of football matches – a look at statistical models for predicting the outcome of football matches in the English Premier League – by Helge Langseth, Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway
Predicting football results using Bayesian nets and other machine learning techniques – by A. Joseph, N.E. Fenton, M. Neil, Computer Science Department, Queen Mary, University of London, UK
Creating a Profitable Betting Strategy for Football by Using Statistical Modelling – by Niko Marttinen, M.Sc., September 2001, Department of Statistics, Trinity College Dublin, Ireland
Predicting outcome of soccer matches using machine learning – a term paper by Albina Yezus, Saint-Petersburg State University, 2014
Prediction and Retrospective Analysis of Soccer Matches in a League – by Håvard Rue and Øyvind Salvesen, Norwegian University of Science and Technology, Trondheim, Norway
Relational Learning for Football-Related Predictions – by Jan Van Haaren and Guy Van den Broeck, Department of Computer Science, Katholieke Universiteit Leuven, Belgium
Game ON! Predicting English Premier League Match Outcomes – by Aditya Srinivas Timmaraju, Aditya Palnitkar, Vikesh Khanna, Stanford University
The predictive power of ranking systems in association football – by Jan Lasek, Zoltán Szlávik and Sandjai Bhulai, VU University Amsterdam, 2013
Check out Google Scholar for more relevant scientific articles.
Popular Science
The Wisdom of Crowds by James Surowiecki, 2004, Paperback – The Wisdom of Crowds: Why the Many are Smarter Than the Few and How Collective Wisdom Shapes Business, Economics, Society and Nations is an excellent book to start with if you are new to the topic of crowd-based prediction methods. In this landmark work, NEW YORKER columnist James Surowiecki explores a seemingly counter-intuitive idea that has profound implications. Decisions taken by a large group, even if the individuals within the group aren’t smart, are always better than decisions made by small numbers of “experts”. This seemingly simple notion has endless and major ramifications for how businesses operate, how knowledge is advanced, how economies are (or should be) organised and how nation-states fare. With great erudition, Surowiecki ranges across the disciplines of psychology, economics, statistics and history to show just how this principle operates in the real world. Along the way, Surowiecki asks a number of intriguing questions about a subject few of us actually understand – economics. What are prices? How does money work? Why do we have corporations? Does advertising work? His answers, rendered in a delightfully clear prose, demystify daunting prospects. As Surowiecki writes: ‘The hero of this book is, in a curious sense, an idea, a hero whose story ends up shedding dramatic new light on the landscapes of business, politics, and society’.
Superforecasting: The Art and Science of Prediction by Professor Philip Tetlock and Dan Gardner, 2016, Paperback – What if we could improve our ability to predict the future? Everything we do involves forecasts about how the future will unfold. Whether buying a new house or changing job, designing a new product or getting married, our decisions are governed by implicit predictions of how things are likely to turn out. The problem is, we’re not very good at it. In a landmark, twenty-year study, Wharton professor Philip Tetlock showed that the average expert was only slightly better at predicting the future than a layperson using random guesswork. Tetlock’s latest project – an unprecedented, government-funded forecasting tournament involving over a million individual predictions – has since shown that there are, however, some people with real, demonstrable foresight. These are ordinary people, from former ballroom dancers to retired computer programmers, who have an extraordinary ability to predict the future with a degree of accuracy 60 percent greater than average. They are super-forecasters. In Superforecasting: The Art and Science of Prediction, Prof. Tetlock and his co-author Dan Gardner offer a fascinating insight into what we can learn from this elite group. They show the methods used by these super-forecasters which enable them to outperform even professional intelligence analysts with access to classified data. And they offer practical advice on how we can all use these methods for our own benefit – whether in business, in international affairs, or in everyday life.
Extraordinary Popular Delusions and the Madness of Crowds by Charles Mackay, 2013 – First published in 1841, Extraordinary Popular Delusions and the Madness of Crowds is often cited as the best book ever written about market psychology. This Harriman House edition includes Charles Mackay’s account of the three infamous financial manias – John Law’s Mississipi Scheme, the South Sea Bubble, and Tulipomania. Between the three of them, these historic episodes confirm that greed and fear have always been the driving forces of financial markets, and, furthermore, that being sensible and clever is no defence against the mesmeric allure of a popular craze with the wind behind it. In writing the history of the great financial manias, Charles Mackay proved himself a master chronicler of social as well as financial history. Blessed with a cast of characters that covered all the vices, gifted a passage of events which was inevitably heading for disaster, and with the benefit of hindsight, he produced a record that is at once a riveting thriller and absorbing historical document. A century and a half later, it is as vibrant and lurid as the day it was written.For modern-day investors, still reeling from the dot-com crash, the moral of the popular manias scarcely needs spelling out. When the next stock market bubble comes along, as it surely will, you are advised to recall the plight of some of the unfortunates on these pages, and avoid getting dragged under the wheels of the careering bandwagon yourself. (Read more on Wikipedia)