Rockwell/Kazaross MET available for XG users
Posted By: neilkaz In Response To: Rockwell/Kazaross MET available for XG users (Phil Simborg)
Date: Saturday, 27 February 2010, at 4:46 p.m.
In Response To: Rockwell/Kazaross MET available for XG users (Phil Simborg)
These questions will be addressed in my forthcoming article for GV.
But in short .. to answer..
1) Every score in a 15 pt match was rolled out 38880 times using GNU 2 ply Supremo. That is 19440 (15 x 1296) for each side on roll, of course with GNU set to roll out as initial position (ie can't start the game w/doubles). This will certainly suffice for we Americans who almost never play anything longer than 15 pts, but for those interested in longer matches I extrapolated to 25 pts using a calculated MET as a template (was a very close match to the RO'ed MET in 12 to 15 pt matches as close scores, and was a little low for the trailer's chances to come back at lopsided scores). Therefore, I used very slight adjustments to improve the trailer's chances to come back from big deficites in long matches in the extrapolation. To check to make sure of my extrapolation, we looked very carefully at cubeless TP's for 2 and 4 cubes and all follow the numerical trends. There's lots fewer issues (I didnt find any) when crossing over from scores that were RO'ed to scores extrapolated with R/K than with g11.
2) Yeah I am convinced when high level bot plays high level bot. I sent the .xls with the MET to Xavier and he checked it vs his RO'ed XG MET (1 ply..and close to g11 (0-ply GNU) and quickly said that he loved the new MET and it was stronger by 0.4 Elo. That may seem to be an insignifant ammount, but using XG's Elo calculator it means that R/K wins 50.04% of all 11 pt matches vs XG's MET. The may seem insignificant but consider that my recollection of lots of matches someone ran with a program to compare g11 to Woolsey MET ended up with g11 winning 50.05% of the matches. So in real life the gain from switching MET is very small. However, more and more the top players play like the top bots and almost serious student of BG studies match play using XG or GNU and does RO's with them so why not use a MET that best reflects their play on high levels? Now we have a MET that is based on how the top bots play in RO's so our RO results should be a bit more accurate, IMHO.
3) The key figure for any MET is -2-1 Crawford. The differences in R/K vs g11 or XG or calculated tables is that our RO's showed 32.31% chances for the trailer at that score vs 31.85% for g11. This may seem to be an unbelievable difference between GNU 0 ply and 2 ply Supremo, but I have revarified this a couple of times as have others. I also RO'ed -2-1C using XG 3 ply, over 3000 trials for each of the 30 opening plays according to score, and when I tabulated the data I got 32.32%. Additionally a quick look at Stick's RO's here for GS and GG will also convince one that the resulting ME for -2-1C is clearly over 32%. When can a new MET come out and replace our's? Well..someone could redo it using XG 3 ply (assuming XG can be set to not roll doubles starting a RO from the opening position) but unless he did several times more trials, I doubt it would be much more accurate.
3a) My guess is that in a few years someone will train a very high level bot to play better at GG and GS than our current bots do. The gains will be slight, and perhaps balance out, but if someone does a very long RO using a very high strength (like XGR+) and shows that -2-1C is clearly different than 32.3%, it may make sense for them to attempt the many month long project to RO a new MET.
But for now and the forseeable future, I will be using R/K since I think it best reflects how both the top bots and top players play and it will make bot evals and RO's more slightly more likely to return the correct result for match analysis.
.. neilkaz ..
Messages In This Thread
BGonline.org Forums is maintained by Stick with WebBBS 5.12.