| |
BGonline.org Forums
Please also check for GNU..my quick trucated RO setting
Posted By: Michael Depreli In Response To: Please also check for GNU..my quick trucated RO setting (neilkaz)
Date: Wednesday, 16 September 2009, at 7:19 a.m.
Neil:
I'm not sure that I understand you here ? Biased ..why more biases than GNU 2,3,4 plies if you basing the final results on longish GNU 2 ply W/C rollouts.
Approximately how many positions do you expect to be comparing the bots on?
ie...you had 626 last time..how many this time (approx)
Automated?? Are you saying that you have a program that did all those evals for the initial Depreli 626?
I am close to finished with my first 100 of the 626 testing 4 ply and 1296 trunc 0/0 @6, and I am doing them manually.
Don't you have to do your's manually for each bot under test and each position?
OK let me explain:
What I think you're asking me to do is once I've (manually) selected the positions where bots disagree you want me to do a @6 R/O too. Sure I can do that and in fact I can automate that using the Cmark feature so no problems there.
However the bias comes in to it because I haven't done @6 R/O on EVERY SINGLE move / cube action for the WHOLE series not just the ones where all the other bots plies don't agree.Just like XGR makes mistakes that NO OTHER bot or ply makes, I'm sure gnu @6 will make some too. So these errors would need to be included to make it a fair comparison. I have no way of automating this. i.e have the gnu use @6 R/O to analyse the whole series of reasonable candidate plays and cube actions.
This is the problem with the error rates you are posting on BOL for the first 200 positions you have tested because you don't have access to the original 500 game series.
This is how it works:
I've analysed all the 500 games with XGR(Huge), XG4ply, XG3ply, GNU 4ply, GNU 3ply, GNU 2ply, S4 3ply, BGB 3ply. The analysis is set to highlight ALL error >0. This part is all automated obviously.
I then open up 3 instances of XG, 3 of GNU one of S4 and BGB and manually check any positions where any bot disagrees, reference it in my spreadsheet to be rolled out.
Running through 10 games at a time takes approx 2 hours work. So far I've done 20 games and identified ~190 positions for rollout. So as you can see we could have near 5000 positions to roll out. The only ones I'm not rolling out are trivial non-contact plays where the equities are within 0.001.
Rollouts are automated using the Cmark feature.
I'm hoping to make this project a far superior benchmark than the previous one by rolling them all out with help from others to stat sig levels so even very small erros will be included.
Last time I because I knew the rollout parameters were not that strong I only published positions where errors >0.039 so many got discarded.
Hopefully that clarifies things.
| |
BGonline.org Forums is maintained by Stick with WebBBS 5.12.