[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

BGonline.org Forums

The right way to measure which player played better in a match

Posted By: Timothy Chow
Date: Thursday, 2 June 2011, at 1:52 a.m.

In Response To: The right way to measure which player played better in a match (Timothy Chow)

This thread has diverged in many different directions and I don't think I can respond to every issue that has been raised, nor do I think that anyone wants me to. But let me try to make a few clarifying comments.

First, I will certainly agree that EMG is useful for many purposes, especially for training one to recognize one's worst mistakes. Even for this purpose, EMG has its limitations, as Jeremy Bagai has eloquently explained, but it's certainly a valuable tool. In a recent article on Gammon Village, Doug Zare has explained how to use EMG and MWC in conjunction to figure out which match scores you need to work on studying most.

What I'm concerned with, however, is stated in the subject line, namely the right way to measure which player played better in a particular match. To me it seems clear that we want a measure that converges to the true advantage that one player has over another in the long run. Of course, this criterion by itself doesn't narrow down the candidates too much. "Who won" has this property, but we all know that since there is so much luck in backgammon, this is a very noisy measure of who played better, and I don't like it much when there are more attractive candidates available.

Maik mentioned the luck-adjusted result, and I guess I should have at least mentioned this in my original post. It's unbiased in the long run, and has the advantage of not requiring the bot to be perfect. Still, it's pretty noisy.

Difference in MWC errors is my candidate. The accuracy of this measure does depend on the bot being perfect, so this is a disadvantage. (An easy way to see this is to imagine the bot playing God; the bot, analyzing the match, will conclude that it is playing perfectly and that God is making occasional errors.) But the benefit is that we get a lot less noise than the luck-adjusted result.

The last candidate, which seems to be the one in common use, is to compare PR's measured in EMG. I don't like this as much as comparing MWC errors for the purposes of measuring which player played better in a particular match. Why not? Well, first of all note that comparing EMG errors also assumes that the bot is perfect, so that point is a wash. A decisive factor in my mind is that there's no mathematical reason that the difference in EMG errors should converge to the win/loss record in the long run, even assuming that the bot's analysis is perfect. This is illustrated by the somewhat artificial example, already raised by others, of A vs. B where A always blunders away 1% MWC early on and B always blunders 1% MWC at the last minute. In the long run A and B should have a balanced win/loss record, but the EMG measure will consistently ding A for being the worse player.

Now, of course, in real life, someone who makes blunders which register 1% EMG early in the match will likely also blunder later on, so you could argue that B is the brilliant player and A is the donkey. However, now it seems to me that we're introducing subtle assumptions about what human beings are typically like, rather than focusing strictly on the evidence provided by the match itself. I would say that in that match where both A and B blundered away 1% MWC but at different times, neither player outplayed the other. Note that if A consistently whoppers early and B consistently makes a smaller error later on, then the MWC metric will pick up that pattern; on the other hand, if A turns out to be a total donkey in general then the MWC metric will pick that up too. On the other hand, the EMG metric won't pick up on patterns such as "choking under the pressure of its being the last game," which, if they exist, can be very important indicators of competitive ability. MWC always works while EMG only works under extra assumptions about how human beings typically play.

To summarize, if you show me just one match and you ask me who outplayed whom in this particular match and by how much, I would use MWC as the metric. If on the other hand you ask me which player is more likely to be the stronger player in general, then I would look at PR's measured in EMG.

Messages In This Thread

 

Post Response

Your Name:
Your E-Mail Address:
Subject:
Message:

If necessary, enter your password below:

Password:

 

 

[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

BGonline.org Forums is maintained by Stick with WebBBS 5.12.