| |
BGonline.org Forums
Statistical significance
Posted By: Timothy Chow In Response To: BOT COMPARISON 350 / 500 Games (eXtreme Gammon)
Date: Monday, 15 March 2010, at 9:38 p.m.
So I think the breakdown is significant and interesting: it shows all level of XG are too optimistic when it come to taking.
Although this is a plausible conclusion, I would be wary of attaching a numerical statistical significance to this statement, because your assumption that "every error is equal" is doubtful.
As I understand it, the positions were chosen by running the games through the bots and flagging the positions on which there was not unanimity. This procedure could very well lead to a biased sample set. For example, suppose that all the bots are too optimistic about doubling. Then the selection procedure would miss all the positions in which all the bots wrongly doubled. Thus your sample set of positions would be biased.
The magnitude of the errors also plays a role, and it's not clear how to weight this. Suppose out of a set of 100 taking decisions, 70% are wrong passes and 30% are wrong takes but the magnitude of the wrong takes is larger on average than the magnitude of the wrong passes. Is the bot too pessimistic or too optimistic about taking in this scenario? It might be "statistically significant" to assert that the passing and taking behavior is not equal, but this still wouldn't tell us whether the bot is too optimistic or too pessimistic.
| |
BGonline.org Forums is maintained by Stick with WebBBS 5.12.