[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

BGonline.org Forums

Nine 5k results (report) - that why people need to check SI

Posted By: Nack Ballard
Date: Monday, 17 January 2011, at 10:59 p.m.

In Response To: Nine 5k results (report) - that why people need to check SI (eXtreme Gammon)

Thank you for your two posts, Xavier.

I never see the 99% or 100% confidence -type estimates (unless I'm missing something in the typical output), so I just take people's word for it, but mainly I'm trying to instill a sense of respect for variance (which obviously you have, but many others as yet do not). It seems to me that too often people roll out reasonable early game candidates 5k or 10k and then prematurely dump the ones that didn't get lucky enough to make their too-tight cut.

There's also the matter of having one's intuition thrown off by looking at 1k results where the margins might be wildly different from reality; Determining the best play isn't everything. I don't want to learn that a margin is .07 when it's really .01. (If one must skimp, then truncated results are much more reliable on balance and take a fraction of the time to run.)

I agree with your numbers except I'd like to review this statement:

Let's remember that the CI given are at 95%, this means 1 time out of 20 it will be outside the interval (which was not the case for the 20K but is for the 25K)

The rollout segments were [D S9] <25 and [S D7] <20. The reported CIs of the plays in the first were .007, giving a CI of .007 * sqrt2 = .010 for the margin, and the CI for the second margin would presumably be .007 * sqrt5/4 * sqrt2 = .011. As the combined result was an average of [D S2] <46 (which, in the absence of further data we're assuming is fair value), the first segment was "off" by .007 and the second was off by .009. That is, both margins (the 25k as well as the 20k) are within the intervals. I did this a bit hastily, so please feel free to check my math.

OTOH, we can also say that [D S9] <25 is a x% event and the [S D7] <20 is a dependent (not independent) y% event. This isn't just a matter of "Oh, heres a 9%-er, and over here I see an 11%-er; no big deal." (I'm just making up numbers for x and y now to save brain cells.) There's a parlay involved. The odds of both rollouts of that position occurring (the only two anyone has done) and for one of them to be off in the opposite direction is on the order of (9%*11%)/2 = 0.5% = 1/200 (or whatever).

[It could just as likely happen that both segments are off in the same direction and you could then flip a coin whether we end up with a result of either [D S10] <46 or [S D6] <46. Indeed, it is conceivable though extremely unlikely that either the 20k or 25k produced the proper result, and the other was incredibly far off.]

Granted, the example stuck out or we wouldn't have noticed and discussed it. However, we haven't yet seen a very large number of XG rollouts posted and then extended. Moreover, when people see movement, the full impact of what has happened often doesn't fully register, mentally. That is the reason that Stick kept track (of the other rollout of the other position) and posted the progressive results and is the reason that I chimed in and expounded on his point. (As you demonstrated, the spread of margins for those nine segments is nothing special, and that's great: it reinforces the point that one 5k result margin differing from another 5k result by .033 for that same position is not particularly anomalous. Maybe upon reading and digesting our posts some people that pay more attention to the depth of the ply than to the number of trials will gain some perspective.)

Also, in my many Snowie and GnuBG observations, extension movements like these seem to occur with significantly greater frequency than they should, as if somewhere along the way the CIs were misprogrammed by a factor of sqrt2 (or more). I haven't paid nearly as much attention to XG because I don't have as much data, I haven't assigned many extensions or accidental duplicates (allowing me to compare what happens), I can't open files, I often receive/see only the nacbrac summaries, etc. However, if the confidence intervals for XG roughly agree with those of GnuBG and Snowie, then it still begs the question...

I've got way too many projects going to compile positions, percentages, etc. (or even to involve myself in these discussions more than occasionally) to support my assertion here. So, it would seem that you're largely stuck with my anecdotal evidence (including a pair of independent same-position non-corrupted GnuBG rollouts that I posted here a year(?) ago that was somewhat jaw-dropping) and your own resourcefulness for now.

Best wishes,

Nack

Messages In This Thread

 

Post Response

Your Name:
Your E-Mail Address:
Subject:
Message:

If necessary, enter your password below:

Password:

 

 

[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

BGonline.org Forums is maintained by Stick with WebBBS 5.12.