Using Noisy Evaluations to Measure Performance
Chris Yep
Friday, 1 April 2016, at 12:40 p.m.
In Response To: Using Noisy Evaluations to Measure Performance (Timothy Chow)
The two most popular bots today, XG and GNU, have similar error rate systems. To be fair, though, it's true that Snowie's error rate formula is significantly different.
For the error rate to mean anything, it has to be averaged over a very large sample. And if the sample is large enough, then I strongly suspect that every reasonable measure converges to the same limit anyway, up to a constant scale factor.
If that's true, I agree that there is limited value to having a new PR method, though convergence speed is still important for various purposes, including the Backgammon Masters Awarding Body (BMAB), etc. Also, people often make conclusions from very small sample size (even a single match). E.g. "I haven't looked at the match yet, but soandso played at a 2.0 PR. He must have played well!"
In any case, while I choose C so that the average PR of the large pool of representative players is unchanged (compared to XG's current PR method), for a given player I think my proposed PR method won't converge to the same limit as XG's current PR method. For example, under XG's current PR method players who seek out complex positions (e.g. complex positions that are equally difficult for both players to play) are "punished" with higher PRs. I suspect that my proposed PR method will partially or wholly account for this. What do you think?

