technology Mersenne Twister Random Number Generator & Diehard Randomness Test from seed Samuel Antão Technical University of Lisbon/INESC-ID, Lisbon, Portugal 14th December, 2009 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 1/35 Outline technology from seed • Mersenne Twister Overview • Mersenne Twister Details • Bitstream Test (Diehard) • Other Diehard results • Conclusions Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 2/35 Mersenne Twister Overview • Proposed in 1998 by M. Matsumoto and T. Nishimura • Ultra long period of 219937-1 • Based on GF(2) fast arithmetic • Standard random number generation in several applications • – Maple – Matlab – Python scripting language technology from seed Faced some criticism by computer science field (George Marsaglia): – Not very elegant – Marsaglia proposed some RNG not so complex and with equivalent randomness properties M. Matsumoto and Takuji Nishimura. “Mersenne Twister – A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator ”. ACM Transactions on Modeling and Computer Simulation, Vol. 8, No. 1, January 1998, pp 3-30 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 3/35 Mersenne Twister Overview technology from seed • History – MT based on General Feedback Shift Register (GFSR) approach (Lewis and Payne, 1973) – Example for the polynomial x5+x2+1 and delay 6: Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 4/35 Mersenne Twister Overview technology from seed • History (cont.) – The GFSR quality depends on judicious chosen seeds – To overcome this problem in 1992 Matsumoto and Kurita proposed the Twisted GFSR (TGFSR) – In the TGFSR one of the feedback operands is multiplied by a matrix – In 1994, Tezuka, discovered significant correlation in the 2 least significant bits of a TGFSR sequence – In 1998, Matsumoto and Nishimura, overcame that problem by setting one of the feedback operands as a concatenation of high/low part of other two – The 1998 proposal sets the generator period to be of the form 2nw-r-1, thus a Mersenne prime: Mersenne Twister Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 5/35 Mersenne Twister Details technology from seed • Lack of definition of “good randomness” • The authors selected two randomness metrics that, together, are believed to be respected only by good generators – K-distribution – Number of terms in the recurrence characteristic polynomial Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 6/35 Mersenne Twister Details • technology from seed K-distribution – Considering a a pseudorandom sequence xi of w-bit integers of period P it is possible to obtain a kv-bit vector as: (truncv( xi ), truncv( xi+1 ), … , truncv( xi+k-1 )) (0 ≤ i < P) – truncv( y ) is the most significant v bits of y – A sequence is said to be k-distributed to v-bit accuracy if all the 2kv possible combinations occur the same number of times in one period (except for zero) – k(v) is the maximum k such that the sequence is k-distributed to v-bit accuracy – Pseudorandom generator period bound: 2k(v)v-1 ≤ P – We are interested in as large as possible k correspondent to as large as possible v Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 7/35 Mersenne Twister Details technology from seed • Recurrence characteristic polynomial number of terms – Each step of the pseudorandom number generation (PRNG) performs a linear recurrence. This recurrence has a specific characteristic polynomial – Usually the characteristic polynomial is a trinomial (GFSR) – A PRNG based on polynomials with few terms shows poor randomness – A PRNG based on polynomials that can be generated from trinomials also shows poor randomness – We are interested in implementing a linear recurrence with a characteristic polynomial with a large amount of terms Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 8/35 Mersenne Twister Details technology from seed • MT recurrence step – General formula xk+n = xk+m xor (xuk|xlk+1)A, (k=0,1,2,…) – Matrix A chosen to be easy to operate 1 0 A 0 0 aw1 aw2 0 0 1 a0 if x0 0 shiftright( x ) xA shiftright( x ) xor a if x0 1 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 9/35 Mersenne Twister Details • technology from seed MT recurrence step – Data set (state) transformation – r bits removed from the state to solve the problems regarding low k-distribution for small v-bit accuracy (v=2) reported in previous TGFSR versions – Period given by 2p-1=2nw-r-1 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 10/35 Mersenne Twister Details • technology from seed Tampering matrix T – After observing a sufficient large amount of random data, it becomes possible to predict nest random values – The recurrence does not suit cryptographic applications – Solution: after each step the state is multiplied by an invertible matrix T – It does not completely overcome the problem (hash function required for cryptographic applications) – Matrix T operations (z=xT): • y = x xor (x >> u) • y = y xor ((y<<s) and b) • y = y xor ((y<<t) and c) • z = y xor (y >> l) Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 11/35 Mersenne Twister Details technology from seed • Matrix T does not change the period of the recurrence • Parameters for the recurrence and matrix T obtained after extensive search • The theoretical period bound 2k(v)v – 1 ≤ 2nw-r – 1 cannot be reached • To attain the maximum period 2p – 1 = 2nw-r – 1 the recurrence characteristic polynomial c(t) must be primitive: t 2 t mod c(t ) 2p t t mod c(t ) • A special method to test for primitivity had to be used to compute the generator parameters in useful time Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 12/35 Mersenne Twister Details technology from seed • MT parameters search K-distribution bounds MT results Previous TGFSR Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 13/35 Mersenne Twister Details technology from seed • MT summary – Period 219937-1 – 623-distributed to 32-bit accuracy – Characteristic polynomial with several terms (~100) – Constrained data set of 624 word data – Fast generation (from 1.5 to 2 times faster than AES encryption) Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 14/35 Bitstream Test technology from seed • The random sequence were obtained from a C library available online by Geoff Kuenning: – http://www.cs.hmc.edu/~geoff/mtwist.html – 132 million random 32-bit integers per second (3.6 GHz Pentium Xeon) • 3000 files of 11,468,800 bytes were obtained, to support 3000 independent Diehard tests • The MT random sequences passed all the Diehard tests (Win32 version) Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 15/35 Bitstream Test • technology from seed Bitstream test details – The random sequence is used to obtain 221overlapped 20-bit words b1b2b3 ...b20 b2b3b4 ...b21 b221 b221 1b221 2 ...b221 19 – Considering a repository that contains all the possible words of 20-bit (220), some of these words may not appear in the 221 obtained from the random sequence – For a truly random sequence, the number of missing words obtained from that sequence should approximately follow a normal distribution of µ=141,909 and σ=428 – Each Diehard test run performs 20 bitstream tests. The total of bitstream tests performed are 20x3000=60000. Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 16/35 Bitstream Test technology from seed • Bitstream test ~ N(141909, 428) (class width = 30) 1800 Obtained Expected 1600 1400 Distribution 1200 1000 800 600 400 200 0 1.4 1.405 1.41 1.415 1.42 Classes 1.425 1.43 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 1.435 1.44 x 10 5 17/35 Other Diehard results technology from seed • Birthday Spacings ~ χ2(6) (class width = 0.1) 450 Obtained Expected 400 350 Distribution 300 250 200 150 100 50 0 0 5 15 10 20 25 Classes Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 18/35 Other Diehard results technology from seed • Overlapping 5-Permutation ~ χ2(99) (class width = 1) 150 Obtained Expected Distribution 100 50 0 20 40 60 80 100 120 Classes 140 160 180 200 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 19/35 Other Diehard results technology from seed • Binary Rank for 31x31 Matrices ~ χ2(3) (class width = 0.1) 80 Obtained Expected 70 Distribution 60 50 40 30 20 10 0 0 5 10 15 Classes Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 20/35 Other Diehard results technology from seed • Binary Rank for 32x32 Matrices ~ χ2(6) (class width = 0.1) 100 Obtained Expected 90 80 Distribution 70 60 50 40 30 20 10 0 0 5 10 15 Classes Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 21/35 Other Diehard results technology from seed • Binary Rank for 6x8 Matrices ~ Exp(2) (class width = 0.01) 400 Obtained Expected 350 Distribution 300 250 200 150 100 50 0 0.5 1 1.5 Classes 2 2.5 3 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 22/35 Other Diehard results from seed Overlapping-Pairs-Sparse-Occupancy ~ N(141909,290) (class width = 30) 3000 Obtained Expected 2500 2000 Distribution • technology 1500 1000 500 0 1.405 1.41 1.415 1.42 Classes 1.425 1.43 1.435 x 10 5 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 23/35 Other Diehard results from seed Overlapping-Quadruples-Sparse-Occupancy ~ N(141909,295) (class width = 30) 3500 Obtained Expected 3000 2500 Distribution • technology 2000 1500 1000 500 0 1.405 1.41 1.415 1.42 Classes 1.425 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 1.43 1.435 x 10 5 24/35 Other Diehard results technology from seed • DNA ~ N(141909,339) (class width = 30) 3500 Obtained Expected 3000 Distribution 2500 2000 1500 1000 500 0 1.405 1.41 1.415 1.42 Classes 1.425 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 1.43 1.435 x 10 5 25/35 Other Diehard results technology from seed • Count the 1’s ~ χ2(2500) (class width = 7) 300 Obtained Expected 250 Distribution 200 150 100 50 0 2200 2300 2400 2500 Classes 2600 2700 2800 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 26/35 Other Diehard results technology from seed • Count the 1’s for specific bytes ~ χ2(2500) (class width = 5) 2500 Obtained Expected Distribution 2000 1500 1000 500 0 2200 2300 2400 2500 Classes 2600 2700 2800 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 27/35 Other Diehard results technology from seed • Parking Lot ~ N(3523,21.9) (class width = 1) 600 Obtained Expected 500 Distribution 400 300 200 100 0 3400 3450 3500 3550 3600 3650 Classes Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 28/35 Other Diehard results technology from seed • Minimum distance ~ Exp(0.995) (class width = 0.01) 700 Obtained Expected 600 Distribution 500 400 300 200 100 0 0 0.5 1 1.5 2 Classes 2.5 3 3.5 4 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 29/35 Other Diehard results technology from seed • 3D Spheres ~ Exp(30) (class width = 1) 2000 Obtained Expected 1800 1600 Distribution 1400 1200 1000 800 600 400 200 0 0 20 40 60 80 100 Classes Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 30/35 Other Diehard results technology from seed • Squeeze ~ χ2(42) (class width = 0.7) 120 Obtained Expected 100 Distribution 80 60 40 20 0 10 20 30 40 50 Classes 60 70 80 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 31/35 Other Diehard results from seed Craps - Wins ~ N(200000p,sqrt(200000p(1-p))) (class width = 15), p=244/495 100 Obtained Expected 90 80 70 Distribution • technology 60 50 40 30 20 10 0 9.75 9.8 9.85 Classes 9.9 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 9.95 x 10 4 32/35 Other Diehard results technology from seed • Craps - Throws ~ χ2(20) (class width = 0.7) 160 Obtained Expected 140 Distribution 120 100 80 60 40 20 0 0 5 10 15 20 Classes 25 30 35 40 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 33/35 Other Diehard results technology from seed • The remaining tests only provide the final p-values, thus it was no possible to plot the results – Runs test – Overlapping sums test • Although, both provide uniform p-values, confirming the good randomness of the generator Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 34/35 Conclusions technology from seed • MT is a widely used pseudorandom number generator • MT has extra large period • MT is computationally fast • MT passes the Diehard statistical tests (and others as well) • Statistical tests are a good platform to test for randomness • The bitstream test, is one of statistical tests to be employed Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa Segurança em Redes Móveis 2009 35/35