[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

BGonline.org Forums

Meyer Dice Tube - Randomness Tested

Posted By: Nick Kravitz
Date: Wednesday, 26 September 2012, at 1:42 p.m.

In Response To: Meyer Dice Tube VIDEO, Randomness & Cheating (Brett Meyer)

Brett

I am not sure if you are still watching this thread, but I did not see a response to your request to test randomness in your results. I am a quantitative analyst (quant) and decent backgammon player (at least I like to think so) and am giving my opinion on your request for randomness analysis below:

There are a few statistical methods to test for randomness. I have used Pearson's chi-squared test, which is probably the best known and tested method for doing so. (see http://en.wikipedia.org/wiki/Pearson's_chi-squared_test) Before I give the results, here are some general comments on statistical testing for people without a requisite background. (You can also find this in any test on statistical testing or here: http://en.wikipedia.org/wiki/Statistical_hypothesis_testing)

In the same way that we cannot poll an entire voting population to predict election results, we cannot roll dice infinite number of times to conclude definitively whether they are fair. As such, statistical tests are not fool-proof and set up to acknowledge the possibility of an incorrect conclusion. For example, if we are testing the randomness of a single die within the dice tube by rolling it 500 times, we might get the same number on all tosses and conclude the dice tube was loaded. However it is still possible (but extremely unlikely) that the tube was indeed fair and we simply happened to be unlucky. In the event we roll 500 of the same number, we can only state there is strong evidence (but no proof) we were rolling with an unfair (non-random) dice tube. Likewise, our dice tube could be loaded to roll more 1's than it should - for example, 1's with probability 50% and 2, 3, 4, 5, and 6 each with probability 10%. However when testing this loaded dice tube, we might roll approximately equal frequencies of each number by chance, in which case we would incorrect conclude the tube was fair when in fact it was not. (side note: Dewey vs Truman was a famous example of a statistical poll failing http://en.wikipedia.org/wiki/Dewey_Defeats_Truman)

By convention, statistical tests are most commonly set up to reject the null hypothesis when it is in fact true with probability 5%, although this value is arbitrary and purely conventional. We simply need some threshold to start being suspicious of non-randomness. Sometimes a truly random process produces results that look non-random; when this happens this is called a type I error, or false positive, and would be equivalent to the first case above (concluding the dice tube is non-random when in fact it is fair)

The output of running the test on a set of random frequencies generated or observed would be a "p-value" - which can be interpreted as probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true (in our case, that the underlying process is indeed random). http://en.wikipedia.org/wiki/P-value By construction, we would expect that if we run many experiments of throwing dice, the p-values would be uniform, and that given a truly random process, over many experiments there would be a 5% chance we would return a false positive. (Equivalently, our p-value would be between 0 and 0.05 with 5% probability)

Before I ran the test on the numbers from the Meyer website, I first ran the test on my own numbers, which I generated from a computer program which I am certain can produce unbiased random numbers. I generated 1000 experiments of 500 die rolls each. Most of the trials I got numbers that looked random enough (for example, 94, 86, 89, 79, 77, 75) which returned a p-value of 0.642. However, around 5% of the time (or around 50 times out of 1000) the numbers looked non-random enough to trip a false positive (for example, 59, 84, 73, 98, 88, 98) which returned a p-value of 0.016. It is actually a good thing to get some false positives; this indicates the test is working as expected. If all 1000 experiments produced p-values above 5%, I would be suspicious that the underlying random process was not working correctly.

Next, I applied the test to the numbers on the Meyer website, which provided results for a total of 12 experiments, one for each starting number for each die. If the rolls were truly random, we would expect the results to look similar to the process described above that we know to be random; i.e. p-values approximately uniform between 0 and 1 (in particular, about half the p-values to be above 0.5 and half below, with maybe 1 observation close to or exceeding the 5% threshold of non-random suspicion)

The p-values I calculated ranged from 0.55 (least random, Blue 5) to 0.997 (most random, RED 1). These results look too good to be true. In fact, if we rolled dice from a process we know in advance to be purely random (for example, rolling a precision die, or having a computer generate random numbers for us) the probability we would get results at least this good by pure chance would be 0.0000143 (equivalently about 1 in 70,000) To put this into backgammon perspective, there would be a better chance of your first 3 rolls coming out all double sizes (a mere 1 in 47,000)

I do not know how the experiment was run. Although there is nothing to indicate that the results were somehow doctored, (or perhaps the most random results "selected" from a larger set of experiments) due to the fact the results look suspect, I would recommend having them re-sampled independently by someone without an interest in the results of the test.

Nick

Messages In This Thread

 

Post Response

Your Name:
Your E-Mail Address:
Subject:
Message:

If necessary, enter your password below:

Password:

 

 

[ View Thread ] [ Post Response ] [ Return to Index ] [ Read Prev Msg ] [ Read Next Msg ]

BGonline.org Forums is maintained by Stick with WebBBS 5.12.