[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

BGonline.org Forums

Minimum # of Trials and Confidence

Posted By: Daniel Murphy
Date: Wednesday, 20 October 2010, at 5:55 p.m.

In Response To: Minimum # of Trials and Confidence (eXtreme Gammon)

Fine comments and data, everyone. Thanks. It's nice to be on the same page.

Stick: Good point. For a one-trial rollout, Gnubg reports no JSDs and reports play SDs as 0.000. Gnubg begins reporting non-zero SDs on the second trial, and JSDs on the third trial. Since it doesn't seem unwise to be suspicious when a bot doesn't show its work I gladly amend my minimum number of trials that is sometimes sufficient from "one trial" to "three trials."

Neil, yes, for an opening 3-1 one trial gives funny results. I hope you did not have the impression that I had said one trial was enough for a rollout of an opening 3-1? I did not say that. I asked a question: how many trials would be enough to see that anything but 8/5 6/5 is a blunder? My answer would not have been one but many fewer than 324. It turns out that 72 is plenty with seed 898907331 and most likely with any seed. Your rollout extended, opening 3-1, score 7-away all, 2-ply, variance reduction, seed 898907331 proceeds:

  • after 17 trials 31P moves into first place and stays there
  • after 28 trials 31P is ahead by >0.117 and >1 JSD
  • after 44 trials 31P is ahead by >0.147 and >2 JSD
  • after 54 trials 31P is ahead by >0.189 and >3.0 JSD
  • after 66 trials 31P is ahead by >0.216 and >4.0 JSD

Tim: Re number of trials, see above. Also, good point about distinguishing between (preexisting) confidence in what the right play is and confidence that a bot's result is reliable. I could remark (but not to dispute your point) that we use our pre-rollout conceptions all the time, in deciding which positions to roll out, which plays to examine, which positions we think bots should play well, and which results we accept. There's also the point (made elsewhere by you and others, I believe) that there may be some question about how accurate various bots' estimates of SE's, JDS's and confidence intervals are.

But there's also the fact that we can easily identify classes of positions for which we find that we obtain from very short rollouts what prove to be very correct answers. More trials might be either no more precise, or more precise only to a needless Nth decimal place. That's because for such positions the "limited number of queries to the bot" causes the bot to intake and manipulate a limited number of datapoints which, although limited, are sufficient for finding precise answers.

As Neil did, you mentioned the 3-1 position. What about the one below it?

Blue on roll.

White9


3X3X ' ' ' ' ' ' ' ' ' '

3O3O ' ' ' ' ' ' ' ' ' '

Blue9

Position ID: dwAAgDsAAAAAAA Match ID: cAkAAAAAAAAA

Seed 898907335, 2-ply cubeful, VR = yes, Bearoff truncation = no.

With one trial, Gnubg reports D/T CF equity as 1.233 and ND CF equity as 0.794. These equities are accurate to three decimal places and will not change no matter how many more trials are done. So yes, there really are positions for which one trial three trials is are enough.

Let's look at another:

Blue on roll.

White4


4X ' ' ' ' ' ' ' ' ' ' '

 ' ' '2O '1O ' ' ' ' ' '

Blue14

Position ID: DwAAABMAAAAAAA Match ID: UQkAAAAAAAAA

Blue has at most two rolls if he doubles, and three if he does not. The D/ND decision is not close. The position is interesting because ND cubeful equity is exactly zero. How many trials do we need?

Trials Cubeful Equity (No Double) Reported Standard Deviation
1 +2.000000 0.000000
2 +0.500000 1.500000
3 +0.666667 0.881917
4 +0.750000 0.629153
5 +0.800000 0.489898
6 +0.500000 0.500000
7 +0.571429 0.428571
8 +0.625000 0.375000
9 +0.666667 0.333333
18 +0.555556 0.245512
27 +0.259259 0.210768
36 +0.000000 0.182574
72 -0.013889 0.130446
108 -0.018519 0.106862
144 -0.034722 0.092667
180 -0.044444 0.082940
216 -0.055556 0.075302
252 -0.047619 0.069489
288 -0.048611 0.064823
324 -0.046296 0.061226
648 -0.043210 0.043099
972 -0.019547 0.035248
1295 -0.002317 0.030560
1296 -0.001543 0.030546
1297 -0.003084 0.030562
2591 +0.000386 0.021600
2592 +0.000000 0.021595
2593 +0.000771 0.021601
7775 -0.001158 0.012467
7776 -0.001029 0.012466
7777 -0.000772 0.012468
15552 +0.004630 0.008815
46656 +0.000000 0.005089
46657 +0.000043 0.005089

Whoa! The bot seems to be having some trouble, although it happens to be correct to six decimal places when trials equal exactly 36 and 2592. What's going on? What's going on is that those results are with Variance Reduction off. What happens when VR is used?

Trials Cubeful Equity (No Double) Reported Standard Deviation
1 +0.000057 0.000000
2 +0.000033 0.000024
3 +0.000030 0.000014
4 +0.000028 0.000010
5 +0.000027 0.000008
6 +0.000019 0.000010
7 +0.000018 0.000009
8 +0.000016 0.000008
9 +0.000012 0.000008
18 +0.000008 0.000006
27 +0.000001 0.000005
36 +0.000000 0.000004
72 +0.000001 0.000003
108 +0.000001 0.000002
144 +0.000000 0.000002
180 +0.000000 0.000002
216 +0.000000 0.000002
252 +0.000000 0.000001
288 +0.000000 0.000001
324 +0.000000 0.000001
648 +0.000000 0.000001
972 +0.000000 0.000001
1295 +0.000000 0.000001
1296 +0.000000 0.000001
1297 +0.000000 0.000001
2591 +0.000000 0.000000
2592 +0.000000 0.000000
2593 +0.000000 0.000000

With VR on, with just one trial we already have a result precise to four decimal places. Two trials, the same, and with reported Standard Error 0.000024. Three trials, the same, and a JSD on ND vs. D/T. 36 trials, precision to five decimal places. 144 trials, precision to six decimal places. Somewhere between 1297 and 2592 trials, Standard Error equals 0.000000.

No one in particular: Maybe someone somewhere is saying "I didn't mean simple positions like those!" So let's put all 30 checkers on the board:

Blue on roll.

White79


1X2X1X2X2X2X2X1X1X1X ' '

 ' ' '3O5O7O ' ' ' ' ' '

Blue79

Position ID: rW2rAAC47w8AAA Match ID: cAkAAAAAAAAA

TrialsND CF equity ND CF SE D/T CF equityD/T CF SEJSD
1 +0.459 0.000 +0.459 0.000 --
2 +0.416 0.043 +0.351 0.108 --
3 +0.373 0.049 +0.242 0.125 0.9
4 +0.392 0.040 +0.261 0.091 1.3
5 +0.397 0.031 +0.276 0.072 1.5
6 +0.408 0.028 +0.282 0.059 1.9
8 +0.413 0.021 +0.274 0.045 2.7
10 +0.410 0.017 +0.280 0.036 3.2
12 +0.411 0.016 +0.278 0.030 3.8
14 +0.414 0.014 +0.285 0.026 4.3
16 +0.399 0.018 +0.274 0.024 4.0
18 +0.393 0.013 +0.274 0.022 4.2
20 +0.399 0.015 +0.278 0.020 4.6
22 +0.392 0.015 +0.273 0.019 4.7
24 +0.392 0.014 +0.275 0.017 5.0
26 +0.392 0.014 +0.275 0.014 5.0
28 +0.393 0.012 +0.276 0.015 5.8
30 +0.393 0.011 +0.279 0.014 6.1
32 +0.393 0.010 +0.282 0.013 6.3
34 +0.393 0.010 +0.280 0.013 6.7
36 +0.389 0.010 +0.2780.0136.4

So ... doubling is a 0.111 blunder. 95% confidence intervals on ND and D/T are about +-0.023; on win % about 0.0007. That's where I stop. Is this rollout not worth looking at because it's "not long enough"?

Xavier: Good point about XG's speed. Stick said much the same earlier -- 324 is the new 108, so to speak. But still, sometimes 108 is more than you need. Sometimes 1296 trials is too few.

Bot rollouts report confidence intervals, standard errors, joint standard deviations. They're telling us what their opinions are worth, and suggesting when we've heard enough.

Messages In This Thread

 

Post Response

Your Name:
Your E-Mail Address:
Subject:
Message:

If necessary, enter your password below:

Password:

 

 

[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

BGonline.org Forums is maintained by Stick with WebBBS 5.12.