Simple random walk performance tips

Hi all I have written a small code chunk for a generic random walk, as I am hoping to use julia for dynamic systems simulations this will be a good toy example for me to improve memory usage.

function rwalks(T,N,n)

    #rwalks returns an array of n random walks of T steps across the set N

    #T is the time step, or number of discreet steps across N

    #N is the real number length of the set

    #n is the number of walks performed

    dt = T/N

    dW = sqrt(dt)*randn(n,N)

    W = zeros(n,N)

    W[:,1] = dW[:,1]

    for j = 2:N

        W[:,j] .= W[:,j-1] .+ dW[:,j]


    0:dt:T-dt, W

    return W


With @time and one walk I get,

julia> @time rwalks(1, 10, 1)
  0.000010 seconds (22 allocations: 2.250 KiB)

and for 10 walks…

@time rwalks(1, 10, 10)
  0.000017 seconds (22 allocations: 5.594 KiB)

Any general suggestions I should use to improve performance? As I said I want to use this for dynamic simulations so optimizing a basic random walk would be a good first step. Apart from @time are there other macros or debugging strategies I should be using?

EDIT: Increasing the number of steps and iterations breaks it.

julia> @time rwalks(1, 100, 1000000)
ERROR: OutOfMemoryError()
 [1] Array at .\boot.jl:408 [inlined]
 [2] Array at .\boot.jl:416 [inlined]
 [3] zeros at .\array.jl:525 [inlined]
 [4] zeros at .\array.jl:522 [inlined]
 [5] zeros at .\array.jl:520 [inlined]
 [6] rwalks(::Int32, ::Int32, ::Int32) at .\REPL[7]:4
 [7] top-level scope at .\timing.jl:174 [inlined]
 [8] top-level scope at .\REPL[16]:0

I didn’t try out much but here is something that might be more efficient. Note that I made the time dimension go down rather than across (could easily be swapped if you wanted, but I usually do things in this manner).

function randwalk(T,N,n)
    dt = T/N
    dW = sqrt(dt) .* randn(N,n)
    return 0:dt:T-dt,dW

@btime randwalk(1,10,10)
  646.605 ns (2 allocations: 1.75 KiB)

could be improved if you wanted to pass in an array to fill, which would make this almost allocation free (that range will always allocate).

Also, using BenchmarkTools.jl @btime macro for measurement


Thanks, why does that drop the number of memory allocations so much?

I allocate a single matrix for the random numbers, but then that summation is in-place, so it reuses the same matrix. If you passed in a pre-allocated matrix of the appropriate size, you could use randn!(yourmatrix) and then that first step would also not allocate new memory.

1 Like

A couple of other tips, since you asked and I’m procrastinating.

  1. Don’t go crazy optimizing this: what I’ve given you is already likely fast enough. Perhaps 1 or 2 more tweaks (like that additional pre-allocation step I mentioned) could help, but it is already pretty stinking fast (and only 4 lines of code!). Saving a few nanoseconds usually isn’t the high ticket item.
  2. You probably do not need to simulate millions of sample paths at the same time. First, 100k or 500k paths is probably sufficient anyway, but if you need to do more then you should start batching. Do 100k at a time, compute your statistics from the simulation, save those results, then do a new batch. At the end you combine your statistics.

You can get another 2X speedup for free using LoopVectorization:

using LoopVectorization

function rwalks(T,N,n)
	dt = T/N          
    W = Matrix{Float64}(undef,n,N)
    W[1:n] = sqrt(dt) .* randn(n)
    @avx for j = 2:N, i = 1:n
        W[i,j] = W[i,j-1] + sqrt(dt) * randn()
    return W
julia> @btime rwalks(1,10,10)            
  411.000 ns (3 allocations: 1.19 KiB) 

you can use rand! to remove allocations