The random number generator is afaik already buffering its output. This is necessary because julia uses mersenne twister, which has an unholy large internal state.
So, as far as I gathered, you do not need random access to you random vector? Then you should generate as you go, and possibly add your own buffer in front.
Secondly, how many random bits do you generate? Well, each symbol has an entropy of -(p*log(p)+(p-1)*log(p-1)), or in other words, 1e10 random coin-tosses with probability 1/3 : 2/3 only need 6.3e9 random bits. Generating 64 bits for each toin-coss is wasteful.
If you do need random access, I would also recommend to not store the random array. Instead store a seed, and use a PRNG with random access to the stream; for example randomstream[n]=aes_encrpypt(key, n).
So, now your question should be: How do you transform a uniform random bitstream into a bernoulli-random bitstream? Preferably using only very few cycles / bit.
Unfortunately I donāt know. Something something asymmetric numeral systems should do the job; anyone else here knows?
AES as RNG:
My system takes (on julia 0.7) 0.84ns for mem-copying one word (8 byte), and 2.4ns for generating and storing one random word (via Random.rand! on large Vector{UInt64}) and ca 3ns for aes-encrypting one word (6ns ~ 12 cycles per encryption via @btime, without touching memory), using llvmcall and the aesni hardware instructions.
Amusingly this lines up almost perfectly with Agner Fogās tables (11 aes instructions; these have a throughput of 1 and latency of 7 on Broadwell; this tells me that @btime is tight enough to fit into the reorder buffer, i.e. subsequent @btime iterations are actually done simultaneously. AES as a PRNG should probably do at least some buffering because we expect ~80 cycles latency for the encryption).
Secondly, how many random bits do you generate? Well, each symbol has an entropy of -(p*logĀ§+(p-1)*log(p-1)), or in other words, 1e10 random coin-tosses with probability 1/3 : 2/3 only need 6.3e9 random bits. Generating 64 bits for each toin-coss is wasteful.
Your expression for information entropy has a sign error and should have log2 not log (if it represents code, not math). Here is the correct formula.
Summing works for ordinary random walks, but not for reflected random walks. Also, Iām interested not in the final position of the walker, but rather in its maximum distance from the origin (over the entire duration of the walk). The code I have is:
function rrw(N,p)
X = BitArray(rand()<p for t in 1:N)
s = 0
m = 0
for t=1:N
s = max(0,s+2*X[t]-1)
m = max(m,s)
end
return m
end
and suggestions for improving this would be welcome. The faster, the better. Thanks!
This may not be relevant, but if p happens to be of the form k/2^n for integer k you could perform fast bitwise operations on n binary vectors to get the desired distribution. For example, for p = 1/4 take the AND of two uniformly distributed bit sequences.
This should probably be marked as the solution. The main question in the thread is āis it better or worse to generate an array of random numbers, rather than generating them as needed?ā The answer is āworseā. At least in the vast majority of use cases.
A p=1/2 random string decompresses into a p=1/3 random string
Alas, Huffman needs to discretize probabilities to powers of 2, so one needs a fancy arithmetic/ANS/FSE decompressor. And I must admit that I canāt recall the details well enough to say whether this can be done chunkwise for the special case of two symbols. You could chunk 8 output bits, each symbol having inverse probability 3^k*1.5^(8-k) , where k is the number of ones, and see whether some library permits setting the table explicitly and with high precision (e.g. each probability as k/2^24). Your generator would be something like cat /dev/urandom | zstd --decompress --no-levzimpel --table=... which should be pretty fast (the command names are made up; I know that zstd uses such a compression internally).
If you really care about speed, you can also probably parallelize your code by computing the max for different segments of the random walk and then connecting/aggregating the results at the end.
How do you know where the walk will be in the different segments until you run the simulation? You can certainly parallelize over independent runs, though.
You can solve for the relative change from beginning to end of each segment as well as the max on each segment. You can then paste the endpoints together (the key step) to get the absolute levels of the walk at each of the endpoints and the absolute max on each segment. The absolute max over the whole walk is the absolute max over the segments.
What I was trying to say is that you donāt know what the segments look like if eg. thereās a reflecting barrier; you only know if the system is translationally invariant and it doesnāt matter where you start.