Cricket Moneyball Two – assessing batting performance over a relatively short period of time.
During the course of an entire career the conventional statistical methods for determining batting prowess work reasonably well. We can for instance determine that with a career test average (Ave) of 99.94 Don Bradman was a half decent test batsman.
The problems arise when we are assessing player performance over a relatively short period of time, when we do not have a large number of innings to sample. This can for instance become an issue when we are attempting to determine current short-term form, or a player’s performance in a given tournament.
There are a variety of potential problems here, varying game conditions for instance (more of this in a later post), but chief amongst these issues is the batsman who has a high number of not out scores which can distort his (or her) average. High numbers of not-outs may be down to the batsman’s innate brilliance, blind luck, or their position in the batting order, we cannot tell. This can lead to an erroneously high AVE which is calculated by dividing runs scored by times out (AVE=R/W). The most frequently cited example of this ‘not-out bias’ is the case of Lance Klusener who, in the 1999 World Cup scored 281 runs in nine innings while only being out twice. This gave Lance an Average of 140.5, despite having a high score of only 52! Clearly a nonsense.
The first attempt to deal with this problem that I can find comes from ‘the two Alans,’ Alan Kimber and Alan Hansford (1) who attempted to draw on earlier work in survival analysis (Cox & Oakes 1984) and reliability analysis (Crowder, Kimber, Smith & Sweeting 1991) to produce a more rational means of batting performance indication.
I am reliably informed that Kimber & Hansford “argue against the geometric distribution and obtain probabilities for selected ranges of individual scores in test cricket using product-limit estimators…” (1)
No, I have no idea what that means either, so you will be relieved to know that others [Durbach (3) and Lemmer (4)] have since demonstrated that this system is almost as unreliable as AVE. So we can forget them and move on.
At this point our old friend H.H. Lemmer comes to our assistance again in (4) & (5) he argues that his analysis showes that if a not-out batsman had been allowed to bat on, he could reasonably expect to score twice the runs that he actually scored. So logically, if we double the not out scores and count those innings as wickets we have a more accurate assessment, right? Well, not quite. Nothing is quite that simple in the wonderful world of cricket moneyball.
The formula derived by Lemmer from his insight is
e6 = (summout + 2.2-0.01 x avno) X sumno/n
n denotes number of innings played
sumout denotes the sum of out scores
sumno denotes the sum of not out scores
avno denotes the average of not out scores
However, if you were to simply double the not out scores and call that innings an ‘out’ you do end up with a very similar figure to e6.
To put that into Lemmer’s parlance, the formula for this simpler method is
e2 = (sumout + 2 x sumno)/n
as you would expect.
Lemmer himself calls this ‘a good estimator’ and that’s good enough for me, this is the formula that I use for day in day out assessment of batting performance in single day games.
Coming to a spreadsheet near you.
There is one caveat, where there is one single very large not-out score the difference between e2 and e6 can become very large (>10), in which case we can use the measure e26 which is found by:
e26 = (e2 + e6)/2
1) Kimber, A.C. and Hansford, A.R. (1993) A Statistical analysis of batting in cricket. Journal of the Royal Statistical Society Series A 156 pp 443-455
2) Tim B. Swartz et al, (2006) Optimal Batting Orders in One day Cricket, Computers and Operations Research 33, 1939-1950
3) Ian Durbach et al (2007) On a Common Perception of a Random Sequence in Cricket South African Statistical Journal
4) Lemmer H.H. (2008) Measures of batting performance in a short series of cricket matches. South African Statistical Journal 42, pp 83-105
5) Lemmer H.H. (2008) An analysis of players’ performance in the first cricket Twenty/20 World Cup series. South African Journal For Research in Sport, Physical Education and Recreation 30 pp71-77
So, with the Lemmer method, what would Klusener’s 1999 WC average be?
Lance Klusener’s more realistic average for the 1999 World Cup is e26 = 62.12. Note that the quick e2 = (sumout + 2 x sumno)/n = 65.25 and e6 = (sumout + [2.2 – 0.01 x avno] x sumno)/n = 59.17. I recommend using e26 =(e2 + e6)/2 = 62.12 because e2 can sometimes be unrealistically large.