Half as wild
Posted By: Daniel Murphy
Date: Thursday, 22 April 2010, at 2:37 a.m.
In Response To: Wild rollout results (bug?) (Nack Ballard)
Neil's long XG 3ply rollout result for
24/20* 20/15 eq: +0.145falls within the 95% confidence intervals reported in Chase's shorter XG 3ply rollouts. They were
Conf: ± 0.009 (+0.127...+0.145)
Conf: ± 0.009 (+0.128...+0.146)But Neil's long XG 3ply rollout result for
24/20* 13/8 eq: +0.158is an outlier to Chase's results:
Conf: ± 0.010 (+0.166...+0.186)
Conf: ± 0.010 (+0.165...+0.185)Conversely, of course, Chase's results for that play are the outliers to Neil's 3ply and 4ply rollouts  see below.
As a reader attempting to judge the validity of the various rollouts and their combination, I'm thinking it would be helpful if the summary data included both absolute equities and 95% confidence intervals.
For example, in the rollouts that Nack cites (other than Snowie's, since I don't know where that is) the absolute equities for the 24/20* 13/8 response were:
XG 3ply eq: +0.176 ± 0.010 XG 3ply eq: +0.175 ± 0.010 XG 3ply eq: +0.158 ± 0.003 XG 4ply eq: +0.152 ± 0.005 Gnubg 2ply eq: +0.137 ±0.005 Above, the two high results are Chase's, already discussed. But what about the Gnubg rollout? It's as unusually low as Chase's are high, compared with Neil's. Is it right to simply combine and average the relative equities of various rollouts by different bots/plies, without regard for absolute equities? One (or more) of these rollouts is "wrong." If, for instance, we want to combine the relative equities of the XG 4ply and Gnubg 2ply rollouts, even though their absolute results are out of each other's 95% confidence intervals, is there a rationale for doing that, other than the old standby, that "bots make mistakes playing both sides, and the errors even out (we think) so the relative equities are legitimate, despite difference in absolute equities"?
And another question: a rollout of two plays reports standard deviations or 95% confidence intervals for the absolute equity of each play. But what's the confidence interval of the relative equity? I'm thinking that we should have less confidence in that than the summary data might be seen to be suggesting.

