| |
BGonline.org Forums
Minimum # of Trials and Confidence
Posted By: Daniel Murphy In Response To: Minimum # of Trials and Confidence (Stick)
Date: Saturday, 16 October 2010, at 2:22 a.m.
Stick writes: That's right, 2 games and it reports 100% confidence, so I can be done right? Of course not ... but that seems to me what Daniel suggested we could get away with in this thread
Tim writes: Daniel certainly overstated the case for short rollouts. For an analogy where common sense should help us cut through the mathematical fog, suppose I have a coin and I want to estimate the probability that it comes up heads when I flip it. No one in their right mind would suggest that a single trial is enough,
My goodness. I thought what I actually said was uncontroversial. Apparently not. Since what I did say was correct, let me restate it:
A sufficient number of trials depends on how close the decision is, the variance in the result, and one's interest in and time available for finding a particularly precise answer. There is no magic minimum number of trials (including 324) that is necessary or sufficient. Many fewer might be enough. Many more might not be enough. And results which are less than "100% certain" are still useful and reliable.
I think that covers it. If you want to address exactly what I wrote before, that'd be fine, too. I stand by it. But would the two of you mind addressing what I actually did write, instead of presenting examples that I could not possibly have had in mind as refutations of "suggestions" that I didn't come close to making? Where did I come anywhere close to suggesting that you can stop a rollout as soon as some bot I don't use says it's "100% confident" of its answer? Or that 2(!) or 20 or 72 trials is a sufficient number of trials for any old 4th, 5th or 6th roll play? Or that one trial is enough to test a play decision as close as a presumably 50/50 coin toss? I surely don't want to be putting words in your mouths. Unfortunately, I get the impression that you both think my "errors" has been firmly refuted by Stick's one position and Tim's coin toss analogy.
If either of would care to point to something I actually did write that "certainly overstated the case for short rollouts," I'd appreciate it.
Meanwhile, shall we look at some positions?
(1) Please tell me why "324 trials as a minimum is a must" for this position:
DMP. Blue to play 3-1
White 167
Blue 167 Position ID: 4HPwATDgc/ABMA Match ID: cIksAAAAAAAA
Assume that all I want to know is whether the second best play is a blunder.
(2) Please tell me how many trials are needed for this position:
Money game, Blue on roll.
White 9
Blue 9 Position ID: dwAAgDsAAAAAAA Match ID: UQkAAAAAAAAA
Assume I want equities precise to 3 decimal places for ND and D/T and that the rollout is 2-ply, not truncated and uses variance reduction.
(3) And this one's interesting:
The score is: White 0, Blue 1 (match to 7 points). Blue to play 3-1
White 150
Blue 156 Position ID: cLfBATBw88gBKA Match ID: cInlAAAACAAA
This should look familiar. It's Stick's exemplary position. A random position from a random rollout session. Ok. But let me suggest that a position that no one would even dream of doing a "short" rollout for might not be the best illustration of the dangers of doing rollouts that are too short. By "no one" I mean anyone who looked at a Gnubg 2-ply evaluation and saw that the top two plays were separated by only 0.017. I'm curious: what is XG's evaluation?
And a final question about this position with regard to "100% confidence," apparently an XG thing. Chuck wrote that he thought Stick had had XG's reporting style in mind. Apparently, Stick and I did not have the same idea in mind when speaking of "100% confidence." Since I don't use XG, I'm glad Stick gave some examples. Anyway, the question is:
Why is XG reporting "100% confidence" after 2 trials and "99.7% confidence" after 72 trials? After 36 trials 2-ply (equivalent to XG 3-ply), Gnubg reports a JSD of about 1. After 72 trials 0-ply (XG 1-ply), Gnubg reports a JSD of about 0.04. Both JSDs corresponds to very low degrees of confidence. So ... what are XG's "100%" and "99.7%" based on?
| |
BGonline.org Forums is maintained by Stick with WebBBS 5.12.