|
BGonline.org Forums
Some stats
Posted By: Achim In Response To: Some stats (Frank Berger)
Date: Thursday, 5 July 2007, at 9:22 p.m.
Havn't we to make a hypothesis, e.g. X beats Y in eg.G. 51% and then can check whether the hypothesis could be falsified?
I'm getting a little bit cautious now, my first numbers were totally wrong (see also this posting):
If you compare all results of two bots against the three other bots you get two normal distributions and two confidence intervalls (95%). In common words one bot is only confident better than the other bot (in the 4 bot shootout) when the difference in the results exceeds 2 (joint?!) standard deviations.
If you take the joint standard deviation Sj = sqrt[Sg*Sg + Sb*Sb] = 2*27.3 (because the standard errors are pretty the same) and the overall results of gnubg (1623) and bgblitz (1519) you get 104/54.6 = 1.90. This leads to a ~97.1% confidence that gnubg is better than bgblitz in the 4-bots shootout.
So this conclusion is only true for comparing all results each of gnubg and bgb against the other three bots. It doesn't say anything about whether gnubg or bgb is the better bot in direct competition. BGB somehow suffers from its "bad" result against jellyfish. If you take out the jellyfish results you get no confidence at all as I wrote in the above mentioned posting.
You get no 95% confidence if you compare only certain sets of 1000 matches between e.g. gnubg and bgblitz (519-481, std.err=15.79). Here the 95% confidence intervall for gnubg is 519 +/- 2*15.79 [487;551]. Because it's below 500 there is no confidence that gnubg is better than bgb (these numbers are taken from a posting at gnubg-list written by Joseph Heled). And vice versa if you take bgb's result.
My statistic lessons are also 20 years ago and I have to admit that I learned a lot today while confusing and disturbing the other readers here and at GammonU with my wrong numbers. I also admit that I won't bet a months salary that the conclusions above are correct ;-).
Ciao
Achim
|
BGonline.org Forums is maintained by Stick with WebBBS 5.12.