| |
BGonline.org Forums
Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
Posted By: Timothy Chow In Response To: Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval (Chase)
Date: Thursday, 25 March 2010, at 8:42 p.m.
If there were an easy, crisp way to quantify exactly how confident we should be about the confidence intervals, then the bots would probably already be doing that. So unfortunately there isn't a very clean answer to your question.
Common sense does suggest a few things, though. From experience, most people seem to feel that 3-ply (= GNU 2-ply) is about the minimum level of lookahead that we feel comfortable with. Based on this, a 1296-game rollout with quasi-random dice is about the minimum length I feel comfortable with.
The doubling cube can do funny things to the variance. The reported confidence level for a DMP game should be accurate if no variance reduction is used, and should still be pretty good even with variance reduction. However, if there's even a relatively small chance that the cube could get high, then there may be a tendency for the bot to underestimate the uncertainty of its estimates. Nack Ballard has said that in his experience, truncated rollouts of opening positions are often more robust than full rollouts. This might seem surprising to people who don't trust the bot's evaluation at the truncation point. While this is a legitimate concern, the point is that if you run the game out to the end then there's an increased chance of further cube-turns, which drive the variance up, and also tend to cause estimates of the variance to undershoot.
A final point I want to make is that the confidence level numbers apply only to the bare statement that "Play A is better than Play B." In practice, though, we are often interested in how much better Play A is than Play B. Even if a rollout puts Play A ahead of Play B by 0.035 and reports "99% confidence," that doesn't mean that it's 99% confident that Play A is 0.035 better than Play B; it only means that it's 99% confident that Play A and Play B aren't equally good. If you want some degree of confidence that Play A is not only better than Play B, but between 0.030 and 0.040 better than Play B, then you must roll the thing out much longer.
| |
BGonline.org Forums is maintained by Stick with WebBBS 5.12.