|
BGonline.org Forums
Using Noisy Evaluations to Measure Performance
Posted By: Chris Yep In Response To: Using Noisy Evaluations to Measure Performance (Bob Koca)
Date: Friday, 1 April 2016, at 3:55 a.m.
Is there also an issue with dependence? For example suppose there are two ways too race and too ways to stay and shuffle checkers inside. A human might rate one type of play incorrectly and both the plays would be too high. The bot though might have one too high and one too low.
Yes, I agree with this. That's one weakness of an (Evaluation + Noise) method of modeling human play.
The effect is even more pronounced if there are a lot of similar moves. For example, suppose we use N = 0.04 (noise follows a normal distribution with a mean of 0 and a standard deviation of 0.04). As an extreme example, suppose there are 200 legal plays with 100 nearly identical ways of racing and 100 nearly identical ways of maintaining contact. Suppose that the bot (with no noise) thinks that racing is 0.04 better than maintaining contact. In an (Evaluation + 0.04 Noise) method, almost all of the "noisy players" will choose a racing play (which implies that this position is easy). But there's actually a pretty good chance that a human will think that all 100 non-racing plays are better than all 100 racing plays. So there's a pretty good chance that a human will choose a non-racing play.
|
BGonline.org Forums is maintained by Stick with WebBBS 5.12.