Study of CLHEP Random Number Engines For GLEAM
Introduction
This study is based on the notion that, since GLEAM (the current version of the GLAST simulation package) is already built using the Gaudi framework, it would make sense for the various GLEAM Gaudi algorithms that require random numbers to obtain them via the Gaudi random number service. Further, based on a suggestion by one of the authors of Gaudi (Markus Frank) to use a seeding scheme similar to that used by the LHCb group, this study tests the suitability of re-seeding on a per event basis. Doing so would greatly simplify the regeneration of particular events if and when such is required but would not be suitable if it destroyed the "randomness" of the random number stream used in the simulation, in the sense of producing a skewed sampling of the available parameter space. To perform the tests a cmt package, RandTest , was created. RandTest uses the CLHEP engines (specified in a Gaudi job options file) to produce streams of random long integers from a set of Gaudi algorithms and outputs them to a binary file. Binary output was required for use with George Marsaglia's Diehard test suite (used by the Zoom project at Fermilab to test the CLHEP random number generators ) for testing the randomness of a generated stream of numbers. RandTest contains perl and ROOT scripts for generating the tests and summaries. It also contains a copy of the Diehard source code which was slightly modified to make it easier to automate the tests.
The tests generated for this study were based on trying to determine whether the proposed scheme met the following requirements for a random number generator: The Tests
- the ability to control setting of seeds (not necessarily independently) for the Gaudi algorithms in Gleam that require random numbers, currently: FluxSvc, G4Generator, and the Digi algorithms.
- given the seeding scheme, generation of psuedo-random number streams for each algorithm that uniformly sample parameter space with good coverage (which is what Diehard attempts to test).
- regeneration of an event from any stage of processing should be as easy as possible (e.g. setting event and run number in a Gaudi job options file).
Consequently the RandTest package contains a set of Gaudi algorithms: an initialization algorithm to set the run and event numbers in the transient data store at the beginning of the event loop, and four nearly identical algorithm which first reset the seed of the random number generator, based on the following perscriptions:Seed1 = 100000 * ((evt->run()+1) % 20000) + ((evt->event()+1) % 100000);
Seed2= 200000 * ((evt->run()+1) % 20000) + ((evt->event()+1) % 100000);
Seed3 = 300000 * ((evt->run()+1) % 20000) + ((evt->event()+1) % 100000);
Seed2= 400000 * ((evt->run()+1) % 20000) + ((evt->event()+1) % 100000);then generate and output four long integers, obtained by converting doubles obtained with the shoot() method (for the CLHEP distribution Rndm::Flat(0.0,1.0)), each time through the event loop. This was done for each of the CLHEP engines tested. Using this scheme each algorithm can generate 2 x 10 ^9 seeds before having to worry about a repeated a seed (20,000 runs with 100,000 events). Four streams of 3 million random numbers (one per algorithm) were generated this way. This is the minimum number required for running the Diehard tests.
For comparison two other streams were generated for each engine tested: one stream of 3 million long integers started from a single seed, and a stream of 3 million long integers concatenated from the output of 20 runs re-seeded on a per run basis.
After running the Diehard test suite on each of the binary files produced in the manner describe above, perl and Root scripts were run to summarize the results. Diehard performs eighteen tests and returns a set of p-values for each of the tests. The interpretation of the set of p-values can be a bit arcane (check the Diehard and Zoom links above for discussion of these issues) but the upshot (as I understood it) is that any p-value that is 0 or 1 to at least six decimal places indicates a failure of the test which produced that p-value, and that for a good generator the overall set of p-values should be distributed more or less uniformly. Results
Based on this analysis the only CLHEP engines that seem even remotely suitable in terms of the number of Diehard tests passed, in combination with the seed per event scheme, were the MTwistEngine Engine and the TriplRand engine. (I vote for MTwistEngine just based on its name - how can something called the Meresene Twister be bad). All other engines failed at least half the tests when used with the seed per event scheme. These findings are consistent with the Zoom findings that these are state of the art engines and are indeed suitable for use by the "paranoid" which perhaps the seed per event scheme ought to make us. Interestingly, based on the table at the Zoom link above, neither seems to invoke much of a penalty in terms of time, being less than twice as slow as the fastest engines. The Ranlux engines, which failed miserably on the Diehard tests in their default configurations have higher "luxury" settings available which significantly improve their randomness properties but at a significant cost in performance. These settings weren't tested in the current study. The following table summarizes the results and contains links to summary histograms (click on the engine name in the left column), text file summaries, and the full Diehard output files. The summary histogram figures show 4 histograms of all of the p-values from the Diehard suite for the single seed run, the 20 concatenated runs, and the results of the streams from algoritm 1 and algorithm 4, respectively.