[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

BGonline.org Forums

Nine 5k results (report) - that why people need to check SI

Posted By: Daniel Murphy
Date: Wednesday, 19 January 2011, at 9:10 p.m.

In Response To: Nine 5k results (report) - that why people need to check SI (Neil Robins)

How common is it, really, that extending an XG (or other bot) rollout of two plays swings the confidence level from 95% or even 99.9% in favor of one play to the other? I've seen only a handful of examples. But it would be surprising if they never happened (I see now that Timothy said, with respect to Stick's rollouts), that "these sorts of fluctuations are no surprise to anyone who works with statistical effects on a regular basis."

In another thread, in a comment by Timothy, the phrase "what we intuitively think" confidence intervals should be caught my eye. I'm not sure what he meant by this -- both the "we" and the "intuitively" (what did you mean, Timothy?) but to my mind there's not much that is intuitive about working with rollout reports, or with statistical probabilities generally.

Even with very simple probabilities. For example, if you tell people unschooled in statistics that the probability of rolling a six in one roll is 11/36 and ask them what the probability of rolling a six in two rolls is, some folks would "intuitively" think that the answer must be 22/36. Or if you told them the chance of rolling a doublet is 1/6 and asked what the chance is of rolling a doublet in two rolls, some would immediately say "2/6." And not because they're dumb, and (I can remember such having such conversations) if you then say "then the chance of rolling a doublet in 6 rolls must be 100%?" they immediately see that something must be wrong with their intuitive thinking. To take another example, a newspaper report of an election year poll might say that candidate X is favored over candidate Y 53% to 47% this week, where last week the numbers were 51%-49%. Even if the article mentions that the 95% confidence interval is ±5%, which often it won't, somehow "we intuitively" think that the 2% jump must be a meaningful improvement in candidate X's approval rating. But it doesn't.

If anything working with rollouts and understanding how they calculate their numbers and what they do and don't mean seems counterintuitive, and with that I'll drop the "I" word. But I wonder if you think, for example, that when the 80-trial rollout of bar/23 13/7 reports equity -0.035 ±0.028, this means that we can say there is a 95% probability that the true equity of the play is between -0.063 and -0.007? It doesn't mean that, but perhaps many think it does?

Another misconception I've seen often (and probably been guilty of myself) is that someone will present a rollout and say something like "the equity was X after 324 trials and still X after 648 trials so X must be about right -- I'm done with it!" But, as the 9 times 5k example shows, there's no reason to think (assuming relevantly not-small confidence intervals) that consecutive segments should report identical equities or if they do) that getting the same result from an extended rollout (again, wrongly disregarding confidence intervals) must be mean that we can be more confident of the extended result.

Anyway, XG calculates the CI the same way in every rollout. Which means it's followed the same procedure in all four results (2 rollouts times 2 plays) you posted, Neil. I don't think this selected data shows that there's anything wrong with the method. One without Timothy and Xavier's maths expertise could observe that the CI's in the short rollout may appear to be a little smaller than they should be (this, from simplistically observing that quadrupling the sample size halves the CI, so if CI with 4480 trials is 0.005, then we might expect CI with 80 trials to be about ±0.038, not ±0.028). But my understanding is that when sample sizes are not suitably large, we should not expect all results to appear "normal," even if there's nothing wrong with the calculation. Which was part, I think, of Xavier's response: intelligent bot use means choosing sample sizes (rollout lengths) appropriate to the position, and 80 trials aren't enough for most* fifth roll plays.


*E.g., 31P-32Z-55A-66-D?

Messages In This Thread

 

Post Response

Your Name:
Your E-Mail Address:
Subject:
Message:

If necessary, enter your password below:

Password:

 

 

[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

BGonline.org Forums is maintained by Stick with WebBBS 5.12.