technology
Mersenne Twister
Random Number Generator
&
Diehard Randomness Test
from seed
Samuel Antão
Technical University of Lisbon/INESC-ID, Lisbon, Portugal
14th December, 2009
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
1/35
Outline
technology
from seed
• Mersenne Twister Overview
• Mersenne Twister Details
• Bitstream Test (Diehard)
• Other Diehard results
• Conclusions
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
2/35
Mersenne Twister
Overview
•
Proposed in 1998 by M. Matsumoto and T. Nishimura
•
Ultra long period of 219937-1
•
Based on GF(2) fast arithmetic
•
Standard random number generation in several applications
•
–
Maple
–
Matlab
–
Python scripting language
technology
from seed
Faced some criticism by computer science field (George Marsaglia):
–
Not very elegant
–
Marsaglia proposed some RNG not so complex and with equivalent randomness properties
M. Matsumoto and Takuji Nishimura. “Mersenne Twister – A 623-Dimensionally Equidistributed
Uniform Pseudo-Random Number Generator ”. ACM Transactions on Modeling and Computer
Simulation, Vol. 8, No. 1, January 1998, pp 3-30
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
3/35
Mersenne Twister
Overview
technology
from seed
• History
– MT based on General Feedback Shift Register (GFSR) approach (Lewis
and Payne, 1973)
– Example for the polynomial x5+x2+1 and delay 6:
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
4/35
Mersenne Twister
Overview
technology
from seed
• History (cont.)
– The GFSR quality depends on judicious chosen seeds
– To overcome this problem in 1992 Matsumoto and Kurita proposed the Twisted
GFSR (TGFSR)
– In the TGFSR one of the feedback operands is multiplied by a matrix
– In 1994, Tezuka, discovered significant correlation in the 2 least significant bits of a
TGFSR sequence
– In 1998, Matsumoto and Nishimura, overcame that problem by setting one of the
feedback operands as a concatenation of high/low part of other two
– The 1998 proposal sets the generator period to be of the form 2nw-r-1, thus a
Mersenne prime: Mersenne Twister
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
5/35
Mersenne Twister
Details
technology
from seed
• Lack of definition of “good randomness”
• The authors selected two randomness metrics that, together, are
believed to be respected only by good generators
– K-distribution
– Number of terms in the recurrence characteristic polynomial
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
6/35
Mersenne Twister
Details
•
technology
from seed
K-distribution
– Considering a a pseudorandom sequence xi of w-bit integers of period P it is
possible to obtain a kv-bit vector as:
(truncv( xi ), truncv( xi+1 ), … , truncv( xi+k-1 )) (0 ≤ i < P)
– truncv( y ) is the most significant v bits of y
– A sequence is said to be k-distributed to v-bit accuracy if all the 2kv possible
combinations occur the same number of times in one period (except for zero)
– k(v) is the maximum k such that the sequence is k-distributed to v-bit accuracy
– Pseudorandom generator period bound: 2k(v)v-1 ≤ P
– We are interested in as large as possible k correspondent to as large as possible v
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
7/35
Mersenne Twister
Details
technology
from seed
• Recurrence characteristic polynomial number of terms
– Each step of the pseudorandom number generation (PRNG) performs a
linear recurrence. This recurrence has a specific characteristic polynomial
– Usually the characteristic polynomial is a trinomial (GFSR)
– A PRNG based on polynomials with few terms shows poor randomness
– A PRNG based on polynomials that can be generated from trinomials also
shows poor randomness
– We are interested in implementing a linear recurrence with a characteristic
polynomial with a large amount of terms
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
8/35
Mersenne Twister
Details
technology
from seed
• MT recurrence step
– General formula
xk+n = xk+m xor (xuk|xlk+1)A, (k=0,1,2,…)
– Matrix A chosen to be easy to operate
1
 0
 


A
0
 0

aw1 aw2
0
 0

 1

 a0 

if x0  0
 shiftright( x )
xA  
shiftright( x ) xor a if x0  1
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
9/35
Mersenne Twister
Details
•
technology
from seed
MT recurrence step
– Data set (state) transformation
– r bits removed from the state to solve the problems regarding low k-distribution for
small v-bit accuracy (v=2) reported in previous TGFSR versions
– Period given by 2p-1=2nw-r-1
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
10/35
Mersenne Twister
Details
•
technology
from seed
Tampering matrix T
– After observing a sufficient large amount of random data, it becomes possible to
predict nest random values
– The recurrence does not suit cryptographic applications
– Solution: after each step the state is multiplied by an invertible matrix T
– It does not completely overcome the problem (hash function required for
cryptographic applications)
– Matrix T operations (z=xT):
• y = x xor (x >> u)
• y = y xor ((y<<s) and b)
• y = y xor ((y<<t) and c)
• z = y xor (y >> l)
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
11/35
Mersenne Twister
Details
technology
from seed
• Matrix T does not change the period of the recurrence
• Parameters for the recurrence and matrix T obtained after extensive
search
• The theoretical period bound 2k(v)v – 1 ≤ 2nw-r – 1 cannot be reached
• To attain the maximum period 2p – 1 = 2nw-r – 1 the recurrence
characteristic polynomial c(t) must be primitive:
 t 2  t mod c(t )
 2p
t  t mod c(t )
• A special method to test for primitivity had to be used to compute the
generator parameters in useful time
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
12/35
Mersenne Twister
Details
technology
from seed
• MT parameters search
K-distribution bounds
MT results
Previous TGFSR
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
13/35
Mersenne Twister
Details
technology
from seed
• MT summary
– Period 219937-1
– 623-distributed to 32-bit accuracy
– Characteristic polynomial with several terms (~100)
– Constrained data set of 624 word data
– Fast generation (from 1.5 to 2 times faster than AES encryption)
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
14/35
Bitstream Test
technology
from seed
• The random sequence were obtained from a C library available online
by Geoff Kuenning:
– http://www.cs.hmc.edu/~geoff/mtwist.html
– 132 million random 32-bit integers per second (3.6 GHz Pentium Xeon)
• 3000 files of 11,468,800 bytes were obtained, to support 3000
independent Diehard tests
• The MT random sequences passed all the Diehard tests (Win32
version)
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
15/35
Bitstream Test
•
technology
from seed
Bitstream test details
– The random sequence is used to obtain 221overlapped 20-bit words
b1b2b3 ...b20
b2b3b4 ...b21

b221 b221 1b221  2 ...b221 19
– Considering a repository that contains all the possible words of 20-bit (220), some of
these words may not appear in the 221 obtained from the random sequence
– For a truly random sequence, the number of missing words obtained from that
sequence should approximately follow a normal distribution of µ=141,909 and
σ=428
– Each Diehard test run performs 20 bitstream tests. The total of bitstream tests
performed are 20x3000=60000.
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
16/35
Bitstream Test
technology
from seed
• Bitstream test ~ N(141909, 428) (class width = 30)
1800
Obtained
Expected
1600
1400
Distribution
1200
1000
800
600
400
200
0
1.4
1.405
1.41
1.415
1.42
Classes
1.425
1.43
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
1.435
1.44
x 10
5
17/35
Other Diehard results
technology
from seed
• Birthday Spacings ~ χ2(6) (class width = 0.1)
450
Obtained
Expected
400
350
Distribution
300
250
200
150
100
50
0
0
5
15
10
20
25
Classes
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
18/35
Other Diehard results
technology
from seed
• Overlapping 5-Permutation ~ χ2(99) (class width = 1)
150
Obtained
Expected
Distribution
100
50
0
20
40
60
80
100
120
Classes
140
160
180
200
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
19/35
Other Diehard results
technology
from seed
• Binary Rank for 31x31 Matrices ~ χ2(3) (class width = 0.1)
80
Obtained
Expected
70
Distribution
60
50
40
30
20
10
0
0
5
10
15
Classes
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
20/35
Other Diehard results
technology
from seed
• Binary Rank for 32x32 Matrices ~ χ2(6) (class width = 0.1)
100
Obtained
Expected
90
80
Distribution
70
60
50
40
30
20
10
0
0
5
10
15
Classes
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
21/35
Other Diehard results
technology
from seed
• Binary Rank for 6x8 Matrices ~ Exp(2) (class width = 0.01)
400
Obtained
Expected
350
Distribution
300
250
200
150
100
50
0
0.5
1
1.5
Classes
2
2.5
3
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
22/35
Other Diehard results
from seed
Overlapping-Pairs-Sparse-Occupancy ~ N(141909,290) (class width = 30)
3000
Obtained
Expected
2500
2000
Distribution
•
technology
1500
1000
500
0
1.405
1.41
1.415
1.42
Classes
1.425
1.43
1.435
x 10
5
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
23/35
Other Diehard results
from seed
Overlapping-Quadruples-Sparse-Occupancy ~ N(141909,295) (class width = 30)
3500
Obtained
Expected
3000
2500
Distribution
•
technology
2000
1500
1000
500
0
1.405
1.41
1.415
1.42
Classes
1.425
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
1.43
1.435
x 10
5
24/35
Other Diehard results
technology
from seed
• DNA ~ N(141909,339) (class width = 30)
3500
Obtained
Expected
3000
Distribution
2500
2000
1500
1000
500
0
1.405
1.41
1.415
1.42
Classes
1.425
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
1.43
1.435
x 10
5
25/35
Other Diehard results
technology
from seed
• Count the 1’s ~ χ2(2500) (class width = 7)
300
Obtained
Expected
250
Distribution
200
150
100
50
0
2200
2300
2400
2500
Classes
2600
2700
2800
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
26/35
Other Diehard results
technology
from seed
• Count the 1’s for specific bytes ~ χ2(2500) (class width = 5)
2500
Obtained
Expected
Distribution
2000
1500
1000
500
0
2200
2300
2400
2500
Classes
2600
2700
2800
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
27/35
Other Diehard results
technology
from seed
• Parking Lot ~ N(3523,21.9) (class width = 1)
600
Obtained
Expected
500
Distribution
400
300
200
100
0
3400
3450
3500
3550
3600
3650
Classes
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
28/35
Other Diehard results
technology
from seed
• Minimum distance ~ Exp(0.995) (class width = 0.01)
700
Obtained
Expected
600
Distribution
500
400
300
200
100
0
0
0.5
1
1.5
2
Classes
2.5
3
3.5
4
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
29/35
Other Diehard results
technology
from seed
• 3D Spheres ~ Exp(30) (class width = 1)
2000
Obtained
Expected
1800
1600
Distribution
1400
1200
1000
800
600
400
200
0
0
20
40
60
80
100
Classes
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
30/35
Other Diehard results
technology
from seed
• Squeeze ~ χ2(42) (class width = 0.7)
120
Obtained
Expected
100
Distribution
80
60
40
20
0
10
20
30
40
50
Classes
60
70
80
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
31/35
Other Diehard results
from seed
Craps - Wins ~ N(200000p,sqrt(200000p(1-p))) (class width = 15), p=244/495
100
Obtained
Expected
90
80
70
Distribution
•
technology
60
50
40
30
20
10
0
9.75
9.8
9.85
Classes
9.9
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
9.95
x 10
4
32/35
Other Diehard results
technology
from seed
• Craps - Throws ~ χ2(20) (class width = 0.7)
160
Obtained
Expected
140
Distribution
120
100
80
60
40
20
0
0
5
10
15
20
Classes
25
30
35
40
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
33/35
Other Diehard results
technology
from seed
• The remaining tests only provide the final p-values, thus it was no
possible to plot the results
– Runs test
– Overlapping sums test
• Although, both provide uniform p-values, confirming the good
randomness of the generator
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
34/35
Conclusions
technology
from seed
• MT is a widely used pseudorandom number generator
• MT has extra large period
• MT is computationally fast
• MT passes the Diehard statistical tests (and others as well)
• Statistical tests are a good platform to test for randomness
• The bitstream test, is one of statistical tests to be employed
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
Segurança em Redes Móveis 2009
35/35
Download

technology