[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

BGonline.org Forums

Theory

Posted By: Maik Stiebler
Date: Sunday, 14 April 2013, at 1:07 p.m.

In Response To: Dear Mr. Depreli (Christian Sorensen)

Your criticism makes assumptions about the relation between a bot's evaluation and the rollouts that are derived from it. Here is a story that might surprise you.

Suppose two bots, Bot A and Bot B, play a long series of games. Then we try to compare their(*0) relative skill by two procedures not entirely unlike Depreli's:

First, we look at the set of B's decisions from the series(*1): whenever(*2) one of those disagrees with what A would have played in B's shoes, Bot A gets to roll-out the decision. Bot A adds the difference in unnormalized equity(*3) to its skill score if its choice comes out ahead of B's in the rollout, and subtracts it if Bot B's choice comes out ahead of A's.

Second, we do the same thing mirrored, i.e. Bot B gets to rollout all of A's original decisions where there is disagreement.

Now if we compare the results of both methods, intuitively we might expect that the first method will systematically be better for Bot A than the second, because Bot A is the one doing the rollouts there. However, mathematical reasoning(*4) tells us that our intuition would be wrong: Both methods converge(*5) to the same mean skill score/game in the long run, and the limit value is equal to the mean actual score between Bot A and Bot B.

As noted below, there are some differences between my theoretical experimental setup and Depreli's actual setup. Thus I cannot conclude that Depreli's test is methodically sound. However, it's not intuitively clear to me, what the consequences of the differences are. The point of this posting is to invite you to be skeptical about your intuition.

*0: First difference between my procedure and Depreli's: I only try to compare the skills of the two bots playing the original series of games where the test positions are extracted from.

*1: There is no distinction between A's and B's original decisions in Depreli's setup.

*2: I mean each and every disagreement. I think Depreli might have some criteria for weeding out "uninteresting" disagreements.

*3: Depreli uses normalized equity for scoring.

*4: I've never bothered to write down a formal proof. If you are really interested, I refer to Douglas Zare's introduction to unbiased measures of skill, my postings in the sub-thread starting here, especially my final reply to Timothy, and Morgan Hugh Kan's thesis (from the University of Alberta Poker AI research group).

*5: Convergence is a problematic term in uncapped backgammon money play. Let's say the series of games is money game capped at some finite cube value.

Messages In This Thread

 

Post Response

Your Name:
Your E-Mail Address:
Subject:
Message:

If necessary, enter your password below:

Password:

 

 

[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

BGonline.org Forums is maintained by Stick with WebBBS 5.12.