BGonline.org Forums

Error rate competitions

Posted By: Timothy Chow
Date: Friday, 20 November 2009, at 6:42 p.m.

I'm in the camp that likes error rates as a tool for self-study but thinks they're a little silly for external comparisons, especially if money is involved. However, for those who are irresistibly attracted to error-rate competitions, here's a suggestion for how to have such a competition that addresses some of the problems with the naive method. (I posted a somewhat different suggestion here some time ago, but I like my current idea better.)

Immediately after a game, each player separately scores every play of the game (by both players). That is, the player pretends he or she is a bot, and writes down the equity lost on each play: zero for most plays, if the two players are world-class, but maybe 0.010 here and 0.030 there, etc. Then the players' scores are compared to those of an actual bot. The total discrepancy between a player's scores and the bot's scores is defined to be the player's error rate.

One advantage of this scheme is that it frees a player to make deliberately unsound plays. Stick can safely open a game with 52\$ because he knows exactly how much the bot will ding his play, so it won't cost him anything in the error-rate competition. Another advantage is that no assumption is made that the bot is right. All that the competition is testing is how well you can simulate a bot. And that is really what a conventional "error-rate competition" is testing anyway; we can't test anybody's true error rate because we don't know what the truth is. As I learned from a recent thread, the bots often have no clue about what is going on in a deep backgame. Under my proposed scheme, a player is not punished for being on the defending side and playing the game better than the bot would. Both players have to score the plays and compare them to the bot's predictions. A final advantage is that if I had a momentary brain freeze and made a bone-headed play, and woke up to reality the instant I punched the clock, I can partially redeem myself by dinging my own bone-headedness after the game.

There is a question as to which bot to use. My proposal would be that the player is free to choose whatever bot he or she likes. That is, if I'm most familiar with Snowie, then I will score the game the way I think Snowie would, and my answers will be compared with Snowie's. If my opponent prefers GNU, that's fine; my opponent's scores will be compared with GNU's. A purist might complain that allowing each player to pick a different bot does not produce a completely objective comparison between the two players. But unless (for example) Snowie is intrinsically easier to predict than GNU is (and I doubt that this is the case), I don't think this will be a serious problem.

