TinyMT Pseudo Random Number Generator for Erlang Kenji Rikitake ACCMS/IIMC, Kyoto University 14-SEP-2012 Kenji Rikitake / Erlang Workshop 2012

1

Contents SIMD-oriented Fast Mersenne Twister (SFMT) implementation issues Tiny Mersenne Twister (TinyMT) on pure Erlang and with NIFs Implementation issues Performance evaluation

Conclusions and future works

Kenji Rikitake / Erlang Workshop 2012

2

Disclaimer: non-goals This presentation is NOT about cryptographically-secure RNGs Use crypto module functions (i.e., OpenSSL code) for secure random number generation in Erlang whenever available

In this presentation, PRNGs are: Of statistically uniform distribution Of predictable output with the same set of generation parameters and internal state Kenji Rikitake / Erlang Workshop 2012

3

Why new PRNG for Erlang? Speed Faster PRNG needed for simulations

Length of period Long enough to ensure statistical randomness

Size of internal state Memory footprint should be small for speed

Concurrent/parallel generation Mathematically-proven independent PRN streams should be capable Kenji Rikitake / Erlang Workshop 2012

4

SFMT on Erlang (2011 Workshop) SFMT PRNG characteristics Much longer period Typical length: 2^19937-1 ~= 10^6002 (internal state: 156 128-bit words = 2496 bytes) OTP random module: ~7x10^12 (internal state: three 15-bit ints)

BSD licensed (commercial use is OK) Implementation available in many languages Now default PRNG for Python and R Kenji Rikitake / Erlang Workshop 2012

5

Our implementation: sfmt-erlang We evaluated various period lengths We concluded 2^19937-1 is optimal

SFMT NIF: x3~4 faster than random Compared time for each 32-bit generation SFMT generates random numbers of the internal table size at once Generation time is proportional to the table length

Now PropEr has a building option for sfmt-erlang (thanks!) Kenji Rikitake / Erlang Workshop 2012

6

SFMT Implementation issues Internal state size is large The state table should be placed in the Erlang BEAM shared heap

Slow without NIFs, but... Need for pure (non-NIF) working code exists NIFs may introduce instability

Generation of independent streams cannot be mathematically guaranteed Per-request creation of SFMT generation parameters is practically too slow Kenji Rikitake / Erlang Workshop 2012

7

So what's new about TinyMT? Iterative generation of 32/64-bit output Not like SFMT which generates as a batch

Period is shorter (2^127-1) It is still long enough for most simulation use

Small memory footprint (28 bytes) 127 bits for the internal state 3 x 32bit integers for the generation parameters

Independent stream generation capability ~2^58 sets of the generation parameters Kenji Rikitake / Erlang Workshop 2012

8

Our implementation: tinymt-erlang Running fast enough in pure Erlang TinyMT has more complex algorithm than that of the random module, but has no floating-point calculation, so could be fast enough to compete with

Readability first, optimization second No tricky micro-optimization

Full compatibility with the random module Direct replacement for the existing code Kenji Rikitake / Erlang Workshop 2012

9

Our assumptions on TinyMT Smaller state size = faster speed TinyMT: 28bytes, SFMT: 2496bytes

Simpler algorithm = faster in pure Erlang TinyMT: 2 functions, 106 lines of 'S' file SFMT: 5 functions, 503 lines of 'S' file ('S' = R15B01 BEAM assembly source file)

Easier generation of independent streams from the same algorithms It's possible on SFMT too, but needs a lot of precomputation and cannot be changed as needed Kenji Rikitake / Erlang Workshop 2012

10

Design details of tinymt-erlang 32-bit output implementation only No floating point calculation No Erlang case statement Note: Erlang integers are BIGNUMs Bitmasking by band operator needed for implementing C-like integer operations

State in a single record #intstate32{} Internal state and the generation parameters can be separately modified Kenji Rikitake / Erlang Workshop 2012

11

Design tips for [1,N] integer RNGs Equivalent to [0, N-1] integer RNGs Ensuring equal probabilities Multiplication of [0, 1) floating-point numbers to generate integer numbers may cause errors unless N = 2^n (order of error: N x 2^(-32)) A right way (implemented in tinymt-erlang) A) Generate a 32-bit integer random number R B) Compute Q which is the closest multiple of N to 2^32 (Q rem N =:= 0, 0 =< (2^32 - Q) =< (N - 1)) C) If R>Q, try the generation at A) again; else compute the result as (R rem N) + 1. Kenji Rikitake / Erlang Workshop 2012

12

Our test environments Erlang/OTP R15B01 on: Intel Xeon E5-2670 x 8 (16 cores), 2.6GHz clock, RedHat Enterprise Linux 6, x86_64 KU ACCMS Supercomputer Cluster B batch node

Intel Core i5-2410M (4 cores), 2.3GHz clock, FreeBSD/amd64 9.0-STABLE (a notebook PC) Intel Atom N270 (2 cores), 1.6GHz clock, FreeBSD/i386 8.3-RELEASE (a netbook PC)

Kenji Rikitake / Erlang Workshop 2012

13

Pure Erlang wall clock test results TinyMT execution speed against the random module (wall clock): x86_64/amd64: x0.93~x1.21 (same speed) x86/i386: x0.31~x0.34 (much slower)

Speed gain of HiPE o3 option comparing to the non-HiPE version (wall clock): x86_64/amd64: x2.4~x3.6 (much faster) x86/i386: x1.25 (TinyMT is still slower (x0.4)) Kenji Rikitake / Erlang Workshop 2012

14

Pure Erlang fprof test results By accumulated time measured by fprof: TinyMT takes x2~x6 execution time than that with the random module For uniform_s/1 (float RNG): TinyMT has to do the integer-to-float conversion, while the random module generates the float result first For uniform_s/2 ([1, N] integer RNG): it is faster than uniform_s/1 on TinyMT, but still x0.5 speed (slower) than that of the random module Kenji Rikitake / Erlang Workshop 2012

15

Observations from pure Erlang tests Overhead to call functions is significant TinyMT needs two function calls to generate a result; aggregation to a NIF will be effective

Integer operation overhead 32bit unsigned integers are BIGNUMs on x86 Small Integers in a word: max 28bits Operations will be much efficient in C than Erlang

NIFs for core functions will be effective For the functions called many times only Kenji Rikitake / Erlang Workshop 2012

16

NIF fprof test results uniform_s/1: x3 faster than non-NIF the same speed as the random module

uniform_s/2: x7 faster than non-NIF x3 speed as the random module

SFMT is x1.52 faster than TinyMT 1 million calls/second seems to be the upper limit for KU ACCMS nodes

Kenji Rikitake / Erlang Workshop 2012

17

Observation from NIF tests Our assumptions proven false: Internal state size is irrelevant to speed which may differ in a memory-constrained system

Simpler algorithm does not necessarily mean faster

Batch generation of multiple numbers is essential for gaining speed Overhead of calling functions and BEAM memory allocation are presumably significant Kenji Rikitake / Erlang Workshop 2012

18

Even more NIF tests (latest work) After the paper submission, batch list generation NIFs are added Speed gain: x6~7 on wall clock, x13 on fprof comparing to the non-batch NIFs C compiler inline optimization (as in the sfmterlang) even makes the code x1.4 faster

sfmt-erlang batch generation is still x3~4 faster than tinymt-erlang in fprof More optimization needed On memory allocation with enif_*() functions Kenji Rikitake / Erlang Workshop 2012

19

NIFs and BEAM scheduling issue Which is better to avoid scheduler hiccups due to the NIFs occupying the CPU time? * On sfmt-erlang and tinymt-erlang batch NIFs High workload of batch processing (list generation)

low workload of copying the results

* On tinymt-erlang per-request NIFs and pure Erlang code Continuous not-so-high workload of per-request NIF processing and instructions executed by BEAM Kenji Rikitake / Erlang Workshop 2012

20

Related works TinyMT key pre-computation

Took 32 days to generate 2^28 sets of 32/64-bit TinyMT generation parameters 18~19 keys/sec for each CPU of KU ACCMS cluster

SFMT and TinyMT seed jumping

Fast computation of multiple state transitions Useful for multiple independent generation

Wichmann-Hill 2006

Successor of the algorithm of random module Output independency of proposed seeding for parallel generation is not firmly proven as in TinyMT Licensing issues exists (non-open license) Michael Truog built the BIGNUM version Kenji Rikitake / Erlang Workshop 2012

21

Conclusion TinyMT is a viable candidate for replacing Erlang/OTP stock non-secure PRNG The pure Erlang code is fast enough especially with HiPE compilation

By NIFnization, the speed is the same as the stock random module When using batch generation NIFs, the speed is x7 faster, though sfmt-erlang is still x3~4 faster than tinymt-erlang Kenji Rikitake / Erlang Workshop 2012

22

Future works Exploring more parallelism Use case proof of multiple independent stream generation is essential Distribution schemes for key generation parameters is essential e.g., through message queues

More optimization needed More performance improvement is possible Memory allocation strategy of NIFs Kenji Rikitake / Erlang Workshop 2012

23

Questions?

Kenji Rikitake / Erlang Workshop 2012

24

TinyMT Pseudo Random Number Generator for Erlang

Sep 2, 2012 - Now default PRNG for Python and R. Kenji Rikitake / Erlang ... A) Generate a 32-bit integer random number R. B) Compute Q which is the ...

280KB Sizes 1 Downloads 271 Views

Recommend Documents

TinyMT Pseudo Random Number Generator for Erlang
Sep 14, 2012 - vided as a general library for multiple operating systems, since .... 3 On Kyoto University ACCMS Supercomputer System B cluster; the test.

TinyMT Pseudo Random Number Generator for Erlang - Kenji Rikitake ...
Sep 14, 2012 - Statistics]: Random Number Generation. General Terms Algorithms, Performance. Keywords .... Table 1 shows a list of tinymt-erlang major exported functions referred in this paper. Figure 1 shows the ..... hiroshima-u.ac.jp/~m-mat/MT/ART

SFMT Pseudo Random Number Generator for Erlang - Kenji Rikitake ...
Erlang/OTP [5] has a built-in PRNG library called random mod- ule. ..... RedHat Enterprise Linux AS V4 of x86_64 architecture5. We chose the five SFMT ... WN time [ms] leciel reseaux thin. Figure 5. Total own time of SFMT gen_rand_list32/2 for 10 cal

SFMT Pseudo Random Number Generator for Erlang - Kenji Rikitake ...
List of sfmt-erlang exported functions referred in this paper. 128-bit shift registers. ..... and does not effectively utilize the concurrent and parallel nature of. Erlang.

A recipe for an unpredictable random number generator
mean-square distance from the origin of the random walk as a function of the number of .... Using this method we have generated a very long sequence of random ... 1010. 100. 101. 102. 103. 104. 105 log(N) log. (a). 101. 102. 103. 104.

A Pseudo-Random Beamforming Technique for Time ...
The base stations equipped with multi-antennas and mobile stations (MSs) are ... vectors that make up a number of transmit beamforming matrices and wireless ...

Random Minecraft Mod Generator 558
3: 2: 1 (THE BEST!) ... Best Minecraft PS4 Seeds - Gameranx.com ... Free Game Generator Codes on Android phone, Code Generator Minecraft Generator Gift.

Scrambled Number Generator For Secure Image ... - IJRIT
Scrambled Number Generator For Secure Image. Transfer. Y.Chaitanya ... and decryption. Full text: https://sites.google.com/a/ijrit.com/papers/home/V1I1150.pdf.

Scrambled Number Generator For Secure Image ... - IJRIT
The present way of image scrambling technique for a secure data image ... space analysis, statistical analysis, correlation analysis, differential analysis, key ...

scrambled number generator for secure image transfer
... Press, 2009.8:210-211. 7. D. Qi, “Matrix transformation and its application to image hiding,” Journal of North China University of. Technology, vol.11, no.1, pp.24–28, 1999. 8. L. Zhu, W. Li, L. Liao, and H. Li, “A novel algorithm for scr

Chapter 3 Random Number Generation
and restart capabilities are more involved than for LCRNG's but the long sequence lengths and speed of execution .... SLAC VM Notebook,. Module 18, SLAC ...

TH Differential Pseudo-Random Pulse: A New UWB ...
... is a new technology that has received much attention for its peculiar advantages: high-bandwidth, ... important applications of UWB is Wireless Personal Area Networks (WPAN). ... licensed usage of UWB signals in wireless communications.

Multi-Cell Pseudo-Random Beamforming: Opportunistic ...
MS feeds back the index of the best BF vector for each BF matrix in terms of maximizing ... System model of multi-cell downlink networks when K = 2 and. Nt = 4.

mnesia mnesia - Erlang
Storage of transaction data ... Optionally append to transaction log - disk_log .... recover locker subscr loader monitor mnemosyne sup checkpoint checkpoint ...

Intelligent Random Vector Generator Based on ...
3, issue 2, pages 188-200, June 1995. [4] M. Kantrowitz, and L.M. Noack, “Functional Verification of a Multiple-issue, Pipelined, Superscalar Alpha Processor –.

Intelligent Random Vector Generator Based on ...
that derives good input probabilities so that the design intent can ... Many industrial companies ..... Verification,” in Proc. of Design Automation Conference,.

Generic Load Regulation Framework for Erlang - GitHub
Erlang'10, September 30, 2010, Baltimore, Maryland, USA. Copyright c 2010 ACM ...... rate on a budget dual-core laptop was 500 requests/s. Using parallel.

Erlang/OTP and how the PRNGs work - Erlang Factory
Mar 30, 2012 - Kenji Rikitake / Erlang Factory SF Bay 2012. 1 ... R15B can handle IPv6 services. • Address format is ... Network part: 64, Host part: 64. Address ...

Build a data platform over and on the web Erlang ... - Erlang Factory
Build a data platform over and on the web. Erlang User Conference 2013 ... Works with mobile & embedded devices. ○ Partials upload & downloads supported.

A Pseudo-Mesothelioma.pdf
tumor after scannographic etiologic. Our case is a ... infiltrates segmental, or more rarely, Pancoast syndrome, acute. respiratory ... A Pseudo-Mesothelioma.pdf.