| |
BGonline.org Forums
Minimum # of Trials and Confidence
Posted By: Daniel Murphy In Response To: Minimum # of Trials and Confidence (eXtreme Gammon)
Date: Wednesday, 20 October 2010, at 5:55 p.m.
Fine comments and data, everyone. Thanks. It's nice to be on the same page.
Stick: Good point. For a one-trial rollout, Gnubg reports no JSDs and reports play SDs as 0.000. Gnubg begins reporting non-zero SDs on the second trial, and JSDs on the third trial. Since it doesn't seem unwise to be suspicious when a bot doesn't show its work I gladly amend my minimum number of trials that is sometimes sufficient from "one trial" to "three trials."
Neil, yes, for an opening 3-1 one trial gives funny results. I hope you did not have the impression that I had said one trial was enough for a rollout of an opening 3-1? I did not say that. I asked a question: how many trials would be enough to see that anything but 8/5 6/5 is a blunder? My answer would not have been one but many fewer than 324. It turns out that 72 is plenty with seed 898907331 and most likely with any seed. Your rollout extended, opening 3-1, score 7-away all, 2-ply, variance reduction, seed 898907331 proceeds:
- after 17 trials 31P moves into first place and stays there
- after 28 trials 31P is ahead by >0.117 and >1 JSD
- after 44 trials 31P is ahead by >0.147 and >2 JSD
- after 54 trials 31P is ahead by >0.189 and >3.0 JSD
- after 66 trials 31P is ahead by >0.216 and >4.0 JSD
Tim: Re number of trials, see above. Also, good point about distinguishing between (preexisting) confidence in what the right play is and confidence that a bot's result is reliable. I could remark (but not to dispute your point) that we use our pre-rollout conceptions all the time, in deciding which positions to roll out, which plays to examine, which positions we think bots should play well, and which results we accept. There's also the point (made elsewhere by you and others, I believe) that there may be some question about how accurate various bots' estimates of SE's, JDS's and confidence intervals are.
But there's also the fact that we can easily identify classes of positions for which we find that we obtain from very short rollouts what prove to be very correct answers. More trials might be either no more precise, or more precise only to a needless Nth decimal place. That's because for such positions the "limited number of queries to the bot" causes the bot to intake and manipulate a limited number of datapoints which, although limited, are sufficient for finding precise answers.
As Neil did, you mentioned the 3-1 position. What about the one below it?
Blue on roll.
White 9
Blue 9 Position ID: dwAAgDsAAAAAAA Match ID: cAkAAAAAAAAA
Seed 898907335, 2-ply cubeful, VR = yes, Bearoff truncation = no.
With one trial, Gnubg reports D/T CF equity as 1.233 and ND CF equity as 0.794. These equities are accurate to three decimal places and will not change no matter how many more trials are done. So yes, there really are positions for which
one trialthree trialsisare enough.Let's look at another:
Blue on roll.
White 4
Blue 14 Position ID: DwAAABMAAAAAAA Match ID: UQkAAAAAAAAA
Blue has at most two rolls if he doubles, and three if he does not. The D/ND decision is not close. The position is interesting because ND cubeful equity is exactly zero. How many trials do we need?
Trials Cubeful Equity (No Double) Reported Standard Deviation 1 +2.000000 0.000000 2 +0.500000 1.500000 3 +0.666667 0.881917 4 +0.750000 0.629153 5 +0.800000 0.489898 6 +0.500000 0.500000 7 +0.571429 0.428571 8 +0.625000 0.375000 9 +0.666667 0.333333 18 +0.555556 0.245512 27 +0.259259 0.210768 36 +0.000000 0.182574 72 -0.013889 0.130446 108 -0.018519 0.106862 144 -0.034722 0.092667 180 -0.044444 0.082940 216 -0.055556 0.075302 252 -0.047619 0.069489 288 -0.048611 0.064823 324 -0.046296 0.061226 648 -0.043210 0.043099 972 -0.019547 0.035248 1295 -0.002317 0.030560 1296 -0.001543 0.030546 1297 -0.003084 0.030562 2591 +0.000386 0.021600 2592 +0.000000 0.021595 2593 +0.000771 0.021601 7775 -0.001158 0.012467 7776 -0.001029 0.012466 7777 -0.000772 0.012468 15552 +0.004630 0.008815 46656 +0.000000 0.005089 46657 +0.000043 0.005089 Whoa! The bot seems to be having some trouble, although it happens to be correct to six decimal places when trials equal exactly 36 and 2592. What's going on? What's going on is that those results are with Variance Reduction off. What happens when VR is used?
Trials Cubeful Equity (No Double) Reported Standard Deviation 1 +0.000057 0.000000 2 +0.000033 0.000024 3 +0.000030 0.000014 4 +0.000028 0.000010 5 +0.000027 0.000008 6 +0.000019 0.000010 7 +0.000018 0.000009 8 +0.000016 0.000008 9 +0.000012 0.000008 18 +0.000008 0.000006 27 +0.000001 0.000005 36 +0.000000 0.000004 72 +0.000001 0.000003 108 +0.000001 0.000002 144 +0.000000 0.000002 180 +0.000000 0.000002 216 +0.000000 0.000002 252 +0.000000 0.000001 288 +0.000000 0.000001 324 +0.000000 0.000001 648 +0.000000 0.000001 972 +0.000000 0.000001 1295 +0.000000 0.000001 1296 +0.000000 0.000001 1297 +0.000000 0.000001 2591 +0.000000 0.000000 2592 +0.000000 0.000000 2593 +0.000000 0.000000 With VR on, with just one trial we already have a result precise to four decimal places. Two trials, the same, and with reported Standard Error 0.000024. Three trials, the same, and a JSD on ND vs. D/T. 36 trials, precision to five decimal places. 144 trials, precision to six decimal places. Somewhere between 1297 and 2592 trials, Standard Error equals 0.000000.
No one in particular: Maybe someone somewhere is saying "I didn't mean simple positions like those!" So let's put all 30 checkers on the board:
Blue on roll.
White 79
Blue 79 Position ID: rW2rAAC47w8AAA Match ID: cAkAAAAAAAAA
Trials ND CF equity ND CF SE D/T CF equity D/T CF SE JSD 1 +0.459 0.000 +0.459 0.000 -- 2 +0.416 0.043 +0.351 0.108 -- 3 +0.373 0.049 +0.242 0.125 0.9 4 +0.392 0.040 +0.261 0.091 1.3 5 +0.397 0.031 +0.276 0.072 1.5 6 +0.408 0.028 +0.282 0.059 1.9 8 +0.413 0.021 +0.274 0.045 2.7 10 +0.410 0.017 +0.280 0.036 3.2 12 +0.411 0.016 +0.278 0.030 3.8 14 +0.414 0.014 +0.285 0.026 4.3 16 +0.399 0.018 +0.274 0.024 4.0 18 +0.393 0.013 +0.274 0.022 4.2 20 +0.399 0.015 +0.278 0.020 4.6 22 +0.392 0.015 +0.273 0.019 4.7 24 +0.392 0.014 +0.275 0.017 5.0 26 +0.392 0.014 +0.275 0.014 5.0 28 +0.393 0.012 +0.276 0.015 5.8 30 +0.393 0.011 +0.279 0.014 6.1 32 +0.393 0.010 +0.282 0.013 6.3 34 +0.393 0.010 +0.280 0.013 6.7 36 +0.389 0.010 +0.278 0.013 6.4 So ... doubling is a 0.111 blunder. 95% confidence intervals on ND and D/T are about +-0.023; on win % about 0.0007. That's where I stop. Is this rollout not worth looking at because it's "not long enough"?
Xavier: Good point about XG's speed. Stick said much the same earlier -- 324 is the new 108, so to speak. But still, sometimes 108 is more than you need. Sometimes 1296 trials is too few.
Bot rollouts report confidence intervals, standard errors, joint standard deviations. They're telling us what their opinions are worth, and suggesting when we've heard enough.
| |
BGonline.org Forums is maintained by Stick with WebBBS 5.12.