-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pseudorandom number generator #135
Comments
I have a section "Guidelines for New RNG APIs" that gives my thoughts on how new APIs should support RNGs. (It seems that the focus is not on security, so ignore the information on cryptographic RNGs there if that is true.) Also, any new PRNGs provided by the API should not use global state; see also the recent NumPy RNG policy. |
I would support having both custom uniform and also non-uniform and integer distributions. Importantly, for Fortran the RNG should work also when run in parallel (both coarray or OpenMP, OpenMPI environments). Quoting from a blog post by Jason Blevins:
The C++ standard library also provides several non-uniform distributions and different generators in a high-level object API. |
@peteroupc That link is fantastic, ill check it out! I have a PRNG that is thread safe, and non global state, in coretran. See #104 and this link It is thread safe under OpenMP and OpenMPI, I have not tried it with coarrays but I don't see why it wouldn't work. The base generator can be changed depending on what the user wants. So far i've implemented the xorshift128+ and xorshift1024* but I would like to add xoroshiro variants too since I read they are a little more robust when it comes to the big crush tests. The seed of the generators can be specified. I also have the capability of randomly generating seeds using the splitmix64 algorithm. The PRNG class can generate past just uniform. Most of the functions were written by Alan Miller and were released into public domain. The default generators in Matlab and Python are Mersenne Twister generators which have a huge period, but they can't be "jumped" for parallel codes, although after reading the link above, maybe they can? My PRNG class has a "jump" type bound procedure. IMHO The best way to use a PRNG in parallel is to instantiate and seed every PRNG on every thread with the same seed, and jump each "rank's" PRNG by an integer amount of cycles, (usually the rank (or thread) number) I believe the speed is comparable to numpy's generator, but its been a while since I ran them. I would love to use this as a starting point, and modify/add to what I already have! Feedback would be greatly appreciated! Caveat I've tested with gfortran, and I think Intel. Would need testing with other compilers. Here's the example from my docs on how to use the PRNG in coretran. program PrngTest
use Prng_Class, only: Prng
implicit none
type(Prng) :: rng
integer(i64) :: seed(16)
integer(i32) :: i, id
real(r64) :: a
! Use a constructor to initialize the Prng with a random seed
! Using display = .true. will print the randomly generated seed to the screen
! So you can reproduce any results later if you wish.
rng = Prng(big = .true., display = .true.)
! Or you could have set the seed using this
! seed = [Some numbers, size 16 if big == .true., size 2 if big == .false.]
! and you can use rng = Prng(seed, .true., .true.)
! Draw from a uniform distribution
call rng%rngUniform(a) ! Can take min, max
! Draw an integer between 1 and 100 inclusive
call rng%rngInteger(id, 1, 100)
! Other distributions
call rng%rngNormal(a) ! can take mean, std
call rng%rngExponential(a, lambda=1.d0)
call rng%rngWeibull(a, lambda=1.d0, k=1.d0)
stop
end program And here is an example in parallel using OpenMPI program PrngTest_mpi
include 'mpif.h'
use Prng_Class, only: Prng
implicit none
type(Prng) :: rng
integer(i64) :: seed(16)
integer(i32) :: i, id, ierror, size, rank
real(r64) :: a
call MPI_INIT(ierror)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
! Generate a seed that is the same on all ranks.!
! seed = [16 numbers]
! This could be generated randomly and broadcast, or set by a user.
! Initialize the PRNG on each rank
rng = Prng(seed, big=.true., display=.true.)
! To avoid PRNG aliasing, jump each PRNG class by the rank number
rng%jump(rank)
! Do cool things.
call MPI_FINALIZE(ierror)
end program |
@peteroupc Thank you for your thoughts. The new PRNG should be non-cryptographic, and a possible option includes xoshiro, xoroshiro, and/or Mersenne Twister (MT). For API, IMHO, we are going to the basic uniform PRNG [0,1) because of the compatibility with @ivan-pi Thank you for the useful resource. It seems that, in parallel, each thread should have a thread-specific PRNG, and I agree with this design. @leonfoks Thank you for sharing your code. I agree that xoshiro/xoroshiro can be the default PRNG. A code for Jump Ahead for MT is available. It is useful for us to have PRNGs of probability distribution functions, and we will probably discuss it after we fix the APIs for the basic PRNGs. We can take advantage of your code to implement the algorithms! As @ivan-pi and @leonfoks suggested, it seems to be better that each thread has a private state explicitly. Steve Lionel also suggested it. The support for thread-safety is inconsistent across compilers: By the way, I have one more question: how much flexibility can we have in PRNGs?
|
I should note that generating random floating-point numbers in computers depends on generating random integers — not the other way around. See also this answer. Notably, since there are infinitely many real numbers between two others, it is impossible for any computer to choose from among all of them. Thus, floating-point random generation methods should be derived from integer PRNGs, not the other way around. The API can include a method that generates a number in [0, 1), but that should not be the only random generation method available. I should also note that many high-quality PRNGs have an internal state of much more than 64 bits (usually either 128 or 256 bits). (In this sense, MT19937's state is huge compared to other modern PRNGs.) Rather, it's more useful to speak of the number of seeds the PRNG admits (which should be at least 2^63 for high-quality PRNGs), and less importantly, of the output size in bits. Having named PRNGs (as is the case in C++ and NumPy 1.17) is useful, since their implementation is more likely to be stable across time. Allowing the use of custom PRNGs is also useful, but is less important at this point. |
@masuday thank you for starting the discussion. I think this is a nice fit into stdlib. Here is my implementation of some of the random number generators (gamma and normal distribution): https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/random.f90 also initializing the seed using system time is a very useful function that we should have in some form in stdlib: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/utils.f90#L402 |
@peteroupc Thank you for your comment! I realized my comment was somewhat ambiguous. I am aware that random FP numbers are from uniform random-bit generators (URBGs) like xoshiro256++. My previous comment meant that the standard FP PRNG [0,1) is a good starting point to discuss API and some specifications because the reference implementation is already available in Fortran ( I think there is no problem in the size of the internal state, i.e., C++ and NumPy is a useful reference for APIs, as @peteroupc mentioned. I tend to like a procedural style (similar to the intrinsic @certik makes a good point (thank you for sharing your code!). |
This post is pretty long. I made it because more concrete suggestions can stimulate the discussion. Here is a draft proposal with some description of the issue. A formal proposal will come in a separate thread. I use some terminology taken from the C++ implementation. In the draft, random numbers mean pseudorandom numbers. OverviewThe scope of PRNGs in
A random-number engine is a main body of URBG, and a typical engine is a family of the linear feedback shift register (LFSLs) and its variants (Mersenne Twister and xoshiro256, etc.). Uniform generators convert the random bits to a uniform random number in integer or real type. Shuffling and random-sampling algorithms depend on the integer random numbers. The uniform random numbers are used to generate statistically-distributed random numbers. In this draft, I focus only on URBGs and uniform PRNGs. We can have both a (procedural) low-level API and an object-oriented API. This proposal is biased to the low-level API, but one can wrap the subroutines to define the OO API. Low-level API for URBG enginesEngine subroutineThe engine should be a subroutine with a common interface. My suggestion for the interface is that the engine receives an array of interface
subroutine random_number_engine(state,harvest)
integer(int64),intent(inout) :: state(:) ! may need "allocatable"
integer(int64),intent(out) :: harvest
end subroutine random_number_engine
end interface The default engine is undefined. I put a dummy subroutine for the engine. subroutine random_number_engine_undef(state,harvest)
integer(int64),intent(inout) :: state(:) ! may need "allocatable"
integer(int64),intent(out) :: harvest
stop "undefined engine"
end subroutine random_number_engine_undef Whenever we use this dummy engine, we need User-derived typeA derived type holds an engine and its state. type random_number_generator
integer :: state_size = 0
integer(int64),allocatable :: state(:)
procedure(random_number_engine),nopass,pointer :: engine => random_number_engine_undef
end type random_number_generator SeedingWe should have a subroutine to seed the engine (i.e., constructor). My suggestion is to have a similar subroutine to the reference, ! The name should be changed.
subroutine random_number_seed(rng, engine, size, put, get, auto)
type(random_number_generator),intent(inout) :: rng ! state structure
character(len=*),intent(in),optional :: engine ! engine name
integer,intent(out),optional :: size ! same as the reference
integer(int64),intent(in),optional :: put(:) ! same as the reference
integer(int64),intent(out),optional :: get(:) ! same as the reference
logical,intent(in),optional :: auto ! automatic seeding
...
end subroutine random_number_seed If the engine has a "jump" feature as pointed by @leonfoks, we can have an extra subroutine to do it. We may have an additional member variable to the derived type: Supporting a user-defined engine, we need one more subroutine. Exampletype(random_number_generator) :: rng
integer :: state_size
integer(int64),allocatable :: seed(:)
! traditional method
call random_number_seed(rng, "xoshiro256**", size=state_size)
allocate(seed(state_size))
seed = 1234567890
call random_number_seed(rng, "xoshiro256**", put=seed)
deallocate(seed)
! automatic method based on timestamp etc
call random_number_seed(rng, "xoshiro256**", auto=.true.) SummarySuggestion:
Low-level API for uniform PRNGsReal numbersFirst, I suggest a uniform PRNG for a real number in [0,1) as in subroutine random_real_uniform(rng,harvest)
type(random_number_generator),intent(inout) :: rng
real(real64),intent(out) :: harvest
...
end subroutine random_real_uniform We can extend it to the general uniform distribution. subroutine random_real_uniform(rng,harvest,lower,upper)
type(random_number_generator),intent(inout) :: rng
real(real64),intent(out) :: harvest
real(real64),intent(in),optional :: lower
real(real64),intent(in),optional :: upper
...
end subroutine random_real_uniform @peteroupc suggests implementing the different ranges: [0,1), (0,1], (0,1), and [0,1]. I find (0,1) or (0,1] useful for generating random numbers under statistical distributions with log(x). Assuming we implement all the four options, we have several options for the API.
We can choose either of them so that the usage is not too complicated. Integer numbersAlthough we can get random bits directly from the engine, we should have a separate subroutine to obtain the bits because of the consistency of APIs. subroutine random_raw_stream(rng,harvest)
type(random_number_generator),intent(inout) :: rng
integer(int64),intent(out) :: harvest
...
end subroutine random_raw_stream For the uniform integer PRNG, we can use the following subroutine. subroutine random_integer_uniform(rng,harvest,lower,upper)
type(random_number_generator),intent(inout) :: rng
integer(int64),intent(out) :: harvest
integer(int64),intent(in) :: lower
integer(int64),intent(in) :: upper
...
end subroutine random_integer_uniform SummarySuggestion:
To be discussed
IMO, it is useful to support a scalar seed like API in other programming languagesI reached the above proposal for APIs because of the comparison with the other programming languages. Local stateRegarding thread safety, each thread should have a separate state. The suggested API always takes the local state as an argument, and it is simple to remember. This design is typical in non-object-oriented languages. Julia can have the state as a local variable. Most numerical libraries supporting Fortran give the local state (e.g., MKL, NAG, GSL). In object-oriented languages, a PRNG is an instance of a generator class, and each thread has a local PRNG (e.g., C++ and NumPy). User-derived type specialized to a particular engineThe "type" (say, object) that I have shown in the proposal is flexible to have any engine if it is compliant with the interface. The engine is assigned in the seed subroutine. Because we can use the common object, the users easily remember this rule. This API is similar to MKL, although it seems not to use a procedure pointer. In C++, the programmer specifies a particular class of the engine. It is efficient in computations, but we have to prepare a derived-type for each of the engines. |
I have investigated the APIs of PRNGs in major programming languages and libraries. See my Gist for details. Based on the research, I would like to suggest a revised API. I would suggest 3 subroutines for a random-bit generator (the engine): initializer, re-seeder, and uniform PRNG. The user must call the initializer for the first time (optionally with a seed). A separate subroutine puts/gets the internal state. A new API looks like: type(random_number_generator) :: rng
integer :: state_size
integer(int64),allocatable :: state(:)
real(dp) :: harvest
! initialize
call random_number_init(rng, "default", seed=[3245,8730,1211])
! uniform real numbers
call random_real_uniform(rng, harvest)
! reseed by date_and_time
call random_number_seed(rng, auto=.true.)
! get state
call random_number_state(rng,size=state_size)
allocate(state(state_size))
call random_number_state(rng,get=state)
! put state
call random_number_state(rng,put=state) I support the "default" PRNG for a real number ranged between 0 and 1. Looking at the other languages/libraries, the range differs.
I believe that [0,1) is a straightforward implementation and that (0,1] or (0,0) is useful for applications such as sampling from statistical distribution functions because zero is problematic in logarithm and division (although the probability of getting zero is negligible). I tend to support (0,1] or (0,0) as the default. Any thoughts? |
I think the 3 basic methods you suggest are a great starting point for us to get going on this. I might have missed this somewhere, but are these functions meant to be type bound procedures of the random_number_generator derived type? They might as well be since the user needs to import the random_number_generator class anyway. The constructor needs to return the derived type random_number_generator from a function to take advantage of the more modern derived type constructor capabilities. This would let the user instantiate the class with the name of the class. @certik @milancurcic Is there any reason we might want to stay away from the constructor interface on DTs based on the discussion about functions returning anything other than a scalar? Here are the key points
From a user perspective, using the module below would look like this type(random_number_generator) :: rng
real(dp) :: a, b(10), c(10, 10)
integer(i64), allocatable :: state(:)
rng = random_number_generator(seed=[3245, 8730, 1211])
call rng%get_state(state)
call rng%rand(a)
call rng%rand(b)
call rng%rand(c) The module module m_random_number_generator
type random_number_generator
integer(int64), allocatable :: state(:)
contains
procedure, public :: get_state => get_state_rng
generic, public :: rand => rand_d1_Prng_, rand_d1D_Prng_, rand_d2D_Prng_
!! random_number_generator%rand() - Draw from a uniform distribution
!! Draws uniform random numbers on [0, 1)
procedure, private :: rand_d1_Prng_ => rand_d1_Prng
procedure, private :: rand_d1D_Prng_ => rand_d1D_Prng
procedure, private :: rand_d2D_Prng_ => rand_d2D_Prng
procedure, public :: set_state => set_state_rng
end type random_number_generator
interface random_number_generator
!!Overloaded Initializer for a Prng - Pseudo random number generator.
module procedure :: init_with_seed_rng, init_random_seed_rng
end interface random_number_generator
!====================================================================!
function init_with_seed_rng(seed) result(this)
type(random_number_generator) :: this
!! Prng Class
integer(i64), intent(in) :: seed(:)
!! Seed of the Prng
! Set the explicit seed
end function init_with_seed_rng
function init_random_seed_rng() result(this)
type(random_number_generator) :: this
!! Prng Class
! Use date and time to obtain a random seed
! Set the seed.
end function init_random_seed_rng
!====================================================================!
subroutine get_state_rng(this, state)
class(random_number_generator), intent(in) :: this
!! Prng Class
integer(i64), allocatable, intent(inout) :: state(:)
!! State of the Prng
end subroutine get_state_rng
subroutine set_state_rng(this, state)
class(random_number_generator), intent(inout) :: this
!! Prng Class
integer(i64), intent(in) :: state(:)
!! Set this as the state of the Prng
end subroutine set_state_rng
! Use FYPP to process the different functions and ranks.
subroutine rand_d1_Prng(this, res)
class(random_number_generator), intent(inout) :: this
!! Prng Class
real(dp), intent(out) :: res
end subroutine
subroutine rand_d1D_Prng(this, res)
class(random_number_generator), intent(inout) :: this
!! Prng Class
real(dp), intent(out) :: res(:)
end subroutine
subroutine rand_d2D_Prng(this, res)
class(random_number_generator), intent(inout) :: this
!! Prng Class
real(dp), intent(out) :: res(:, :)
end subroutine
end module m_random_number_generator |
@leonfoks, I agree with you. I started from a traditional style, and my suggestion converged to an object-oriented style. We can implement PRNGs in the OO style while we have traditional (non-OO) APIs that I prefer to use as @certik. We can wrap the OO APIs with some subroutines (it is a reason why my suggestion has subroutines). I prefer short names: Currently, we do not have OO APIs in |
EDIT: I rewrote the code because my post was wrong. Everything in a class is private ( I have some suggestions on @leonfoks's class proposal. IMO, the internal state should not be open. Also, we may have 2 constants for convenience. integer,parameter :: random_seed_kind = int64 ! would be int32
integer,parameter :: random_state_kind = int64
type random_number_generator
private
integer(random_state_kind),allocatable :: state(:)
contains
...
end type random_number_generator Following @certik's comment, I would modify subroutine get_state(this,state)
class(random_number_generator) :: this
integer(random_state_kind),intent(inout) :: state(:)
end subroutine get_state |
Wrapping the OO API in a subroutine would effectively mean the code is only valid from Fortran 90 upward when derived types were introduced. While I suppose most people today are using sufficiently modern compilers and would not be affected (unless enforcing specific compiler flags), should we aim to have subroutine versions for users who might want to import the new RNG routines into their decades old F77 codebases? @certik @milancurcic |
@ivan-pi I think there is general agreement that we will do a low level API that is just subroutines, no derived types (unless there is no clean way to do that), and then an optional high level API that uses OO. |
As for the OO-API: I've ported CP2K's parallel stream PRNG to OO. As PRNGs go it is probably a very specific case and unimportant for most other applications, but already before I ported it there was a possibility to dump & load the state. |
@masuday do you want to open a PR with initial implementation of this? I think there is a broad agreement that we want this, and it's just about agreeing on an API that would work for everybody. |
@certik Thank you for chiming. I agree with you, and I am making a PR in a couple of days. |
I'd like to call out this RNG library, which I use regularly (currently the C++ version). See also wiki. There is a Fortran wrapper to the version producing 32bit random outputs but I think it's worth implementing the whole thing. It seems to have become quite popular lately. There is also an explicit SIMD version of one of the 32-bit output variants in C by way of AVX intrinsics. EDIT: but a generator with less than 64bit state isn't recommended for scientific applications |
Also, I noticed that the author of xorshiro256++ has a "vectorized" implementation. I put it in quotes because he doesn't explicitly use SIMD instrinsics, but writes it in a way that makes it extremely easy for the compiler to vectorize (on gcc use |
Random number generators are rarely implemented as |
TensorFlow has both impure and "stateless" RNG, for example tf.random.stateless_uniform |
I'm closing this issue since Probability Distribution and Statistical Functions -- PRNG Module #271 was merged in 6 Feb 2021 so it's likely to be resolved (it isn't tagged in any release version nor mentioned but it was probably in 0.2.0). |
EDIT: An updated proposal is available below.
There are a few suggestions to have pseudorandom number generators (PRNGs) in the library, e.g., #1 (comment) and #104 (comment). Focusing on the uniform floating-point PRNG [0.0, 1.0), we have already had the intrinsic subroutine,
random_number
. The first question is whether we should have custom PRNGs of the uniform distribution. Or, we should concentrate on derived PRNGs like non-uniform distributions or integer PRNGs. I think it is still worth developing custom PRNGs because of better flexibility and performance.With the custom PRNGs, we have many more questions.
The decision defines a design of API. A possible API is the same as the intrinsic subroutines. The subroutine name in the below example is just a placeholder.
Please tell me what you think about it.
The text was updated successfully, but these errors were encountered: