Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval

[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

BGonline.org Forums

Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval

Posted By: Timothy Chow
Date: Thursday, 25 March 2010, at 8:42 p.m.

In Response To: Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval (Chase)

If there were an easy, crisp way to quantify exactly how confident we should be about the confidence intervals, then the bots would probably already be doing that. So unfortunately there isn't a very clean answer to your question.
Common sense does suggest a few things, though. From experience, most people seem to feel that 3-ply (= GNU 2-ply) is about the minimum level of lookahead that we feel comfortable with. Based on this, a 1296-game rollout with quasi-random dice is about the minimum length I feel comfortable with.
The doubling cube can do funny things to the variance. The reported confidence level for a DMP game should be accurate if no variance reduction is used, and should still be pretty good even with variance reduction. However, if there's even a relatively small chance that the cube could get high, then there may be a tendency for the bot to underestimate the uncertainty of its estimates. Nack Ballard has said that in his experience, truncated rollouts of opening positions are often more robust than full rollouts. This might seem surprising to people who don't trust the bot's evaluation at the truncation point. While this is a legitimate concern, the point is that if you run the game out to the end then there's an increased chance of further cube-turns, which drive the variance up, and also tend to cause estimates of the variance to undershoot.
A final point I want to make is that the confidence level numbers apply only to the bare statement that "Play A is better than Play B." In practice, though, we are often interested in how much better Play A is than Play B. Even if a rollout puts Play A ahead of Play B by 0.035 and reports "99% confidence," that doesn't mean that it's 99% confident that Play A is 0.035 better than Play B; it only means that it's 99% confident that Play A and Play B aren't equally good. If you want some degree of confidence that Play A is not only better than Play B, but between 0.030 and 0.040 better than Play B, then you must roll the thing out much longer.

Messages In This Thread

Rollouts of Robertie's 501 problems: Final pass complete
Timothy Chow -- Thursday, 25 March 2010, at 1:54 a.m.
- Rollouts of Robertie's 501 problems: Final pass complete
  John O'Hagan -- Thursday, 25 March 2010, at 2:43 p.m.
- Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
  Michael Depreli -- Thursday, 25 March 2010, at 3:39 p.m.
  - Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
    Chris Bray -- Thursday, 25 March 2010, at 4:10 p.m.
    - Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
      Michael Depreli -- Thursday, 25 March 2010, at 4:16 p.m.
    - Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
      Timothy Chow -- Thursday, 25 March 2010, at 7:02 p.m.
      - Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
        Chase -- Thursday, 25 March 2010, at 7:52 p.m.
        
        Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
        Timothy Chow -- Thursday, 25 March 2010, at 8:42 p.m.
        
        Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
        Chuck Bower -- Friday, 26 March 2010, at 1:36 a.m.
        
        Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
        Chris Bray -- Friday, 26 March 2010, at 2:15 p.m.
        
        Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
        Daniel Murphy -- Friday, 26 March 2010, at 5:47 p.m.
        
        Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
        Timothy Chow -- Friday, 26 March 2010, at 3:35 p.m.
        
        Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
        Chase -- Friday, 26 March 2010, at 4:16 p.m.
        
        Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
        Timothy Chow -- Friday, 26 March 2010, at 5:21 p.m.
        
        Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
        Chuck Bower -- Saturday, 27 March 2010, at 1:43 a.m.
        
        Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
        Tom Keith -- Saturday, 27 March 2010, at 1:36 p.m.
        
        Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
        Michael Depreli -- Saturday, 27 March 2010, at 1:48 p.m.
        
        Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
        Timothy Chow -- Saturday, 27 March 2010, at 4:14 p.m.
        
        Rollouts of Robertie's 501 problems: Pos 129 Snowie Eval
        myshlev -- Sunday, 28 March 2010, at 9:12 a.m.
        
        Settlement limits
        Tom Keith -- Sunday, 28 March 2010, at 1:42 p.m.
        
        Settlement limits
        Timothy Chow -- Sunday, 28 March 2010, at 2:26 p.m.
- Rollouts of Robertie's 501 problems: Final pass complete
  Chuck Bower -- Thursday, 25 March 2010, at 4:33 p.m.
- Rollouts of Robertie's 501 problems: Final pass complete
  Chase -- Friday, 26 March 2010, at 3:56 p.m.
  - Rollouts of Robertie's 501 problems: Final pass complete
    Bruce -- Friday, 26 March 2010, at 4:05 p.m.
    - Rollouts of Robertie's 501 problems: Final pass complete
      Chase -- Friday, 26 March 2010, at 4:17 p.m.
      - Rollouts of Robertie's 501 problems: Final pass complete
        Timothy Chow -- Friday, 26 March 2010, at 5:18 p.m.
- XG 2-ply RO
  Robert-Jan Veldhuizen (Zorba) -- Sunday, 28 March 2010, at 3:03 a.m.
  - XG 2-ply RO
    Timothy Chow -- Sunday, 28 March 2010, at 3:27 a.m.
    - XG 2-ply RO
      Robert-Jan Veldhuizen (Zorba) -- Sunday, 28 March 2010, at 4:44 p.m.
      - XG 2-ply RO
        Timothy Chow -- Sunday, 28 March 2010, at 5:24 p.m.
- XG stepped RO
  Robert-Jan Veldhuizen (Zorba) -- Sunday, 28 March 2010, at 4:46 p.m.
- and full XG 3-ply RO
  Robert-Jan Veldhuizen (Zorba) -- Monday, 29 March 2010, at 8:14 p.m.

[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

BGonline.org Forums is maintained by Stick with WebBBS 5.12.

Post Response

Your Name:
Your E-Mail Address:
Subject:
Message:
If necessary, enter your password below:
Password: