Optimizing a simple random walk simulation

Then you should definitely just run each walker separately, i.e. swap the order of your loops. You don’t even need arrays of walkers, just an array to store when they died that is updated at the end of their loop.

e.g.

function foo3()
    n_walkers = 1000
    n_steps = 1000
    step_size = 4
    n_died = zeros(Int, n_steps) # number that died on step i

    for _ in 1:n_walkers
        xyz = @SVector zeros(3)
        for i in 1:n_steps
            xyz += normalize(@SVector(randn(3))) * step_size
            if any(xyz .> 10)
                n_died[i] += 1
                break
            end
        end
    end
    return n_died
end

(note that the number alive on each step is simply n_walkers .- cumsum(n_died) at the end). This is about 2x faster than foo2() on my machine. (Update: sorry, not 1000x, I misread the numbers.)

(To parallelize this, which would only be worthwhile if you vastly increased the number of walkers, you would need to synchonize the n_died[i] += 1 statement. This isn’t too expensive since it only occurs at the end of the inner loop. You could either have a single n_died array and put a mutex lock around the increment, or have per-thread n_died arrays and then sum them together at the end. Or use MPI.jl and do an allreduce on n_died at the end.)

2 Likes