Best way to optimize code

I’m making my first effort to move from Matlab to Julia and have found my code to improve by ~3x but still think there is more to come, I’m not using any global variables in the function and have preallocated all the arrays used (I think?). If there was any thoughts on how it could be sped up even further it would be greatly appreciated, I’ll fully convert even at the current improvement I think!

``````function word_sim(tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega

words = zeros(Int32, 1, omega) # to store the words
tests = rand(1,omega) # will compare mu to these
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
innovates = mu .> tests; # when we'll make a new word
for i = 2:tau # simulate the process
if innovates[i] == 1 # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
# force the word we're interested in
for i = (tau+2):omega
if innovates[i] == 1 # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
result = sum(words .== (omega + 1)); # count how many times our word occurred
return result
end
``````

and when I run it with these values it takes ~.26 seconds on my PC

``````using Statistics
@time begin
nsim = 10^3;
omega = 100;
seed = [0:1:(omega-1);];
mu = 0.01;

results = zeros(Float64, 1, length(seed));
pops = zeros(Int64, 1, nsim);
for tau in seed
for jj = 1:nsim
pops[jj] = word_sim(tau, omega, mu);
end
results[tau+1] = mean(pops);
end
end
``````

Or perhaps I’d be better writing the code in C++? Julia was my first reaction as I’ve heard rave reviews about its syntax, which to be honest is fantastic!

1 Like

You could replace `sum(words .== (omega + 1))` with `count(==(omega + 1), words)`, which shaves another 40 ms off for me. For benchmarking, you might want to replace `@time` with `@btime` from BenchmarkTools.jl, which will exclude compilation time and build an average. Otherwise your code looks pretty good, I don’t see much more obvious optimizations left on the table, but am glad to be proven wrong.

3 Likes

I would say just `if innovates[i]` would be slightly better style than `if innovates[i] == 1` since it’s an array of bools, but the performance implications are probably minimal.

1 Like

You could cut the allocations even more(although I’m not sure you will gain much speed) by keeping seed as range ( no square brackets) and preallocate with Array{Int64,1)(undef,dim). Keep in mind the allocated array will contain gibberish;).

2 Likes

You get another 60 ms, if you don’t allocate `innovates`:

``````function word_sim2(tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega

words = Array{Int32}(undef, omega) # to store the words
tests = rand(omega) # will compare mu to these
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
for i = 2:tau # simulate the process
if mu > tests[i] # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
# force the word we're interested in
for i = (tau+2):omega
if mu > tests[i] # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
result = count(==(omega + 1), words) # count how many times our word occurred
return result
end
``````
3 Likes

There is no need to allocate new `words` and `tests` arrays for each call, i.e. construct them before the test loops and send them in as arguments to `word_sim`. (`tests` can be filled with `Random.rand!`)

More radically the first of the two loops in `word_sim` has no effect on the output and can be removed.

You can try but chances are that you would get similar speed in Julia if you write the code in the same way.

4 Likes

Hi there! Seems like your function spends about 50% with the RNG. Precalculating the random numbers and using @inbounds removes another 25ms on my system (although we calculate the random number even when not needed).
Please verify if my indexing is correct (probably not). You can then get more performance by skipping the unneeded RNG operations (mu > tests[i] is known before the loop)

``````
function word_sim3(tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega

words = Array{Int32}(undef, omega) # to store the words
tests = rand(omega) # will compare mu to these
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
rand0 = [rand(UInt32)%(i-1)+1 for i=2:omega] # precalc. random numbers
@inbounds for i = 2:tau # simulate the process
if mu > tests[i] # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
#words[i] = words[rand(1:(i-1))]
words[i] = words[rand0[i-1]]
end
end
# force the word we're interested in
@inbounds for i = (tau+2):omega
if mu > tests[i] # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
#words[i] = words[rand(1:(i-1))]
words[i] = words[rand0[i-1]]
end
end
result = count(==(omega + 1), words) # count how many times our word occurred
return result
end
``````

My understanding is the default MersenneTwister RNG is more efficient if you use the `rand(T, n)` methods which uses the dSFMT algorithm (rather than something like `[rand(T) for _ = 1:n]`). So here it could be even faster to do `rand0 = rand(UInt32, omega)` and then do the `%` stuff after (either right away or where it gets used).

2 Likes

It’s worth noting that 1.5 will likely have a faster randomness in general and random ranges in particular. See https://github.com/JuliaLang/julia/pull/34852 and https://github.com/JuliaLang/julia/pull/29240

2 Likes

This is very interesting, I hadn’t seen the inbounds functionality before. Really useful thanks!

This is very true (in particular for `Float64`), so I don’t see how allocating an array with `[rand(T) for _ = 1:n]` can lead to faster code than calling `rand(T)` on demand.

Something you might try is to pass explicitly an rng (`MersenneTwister` instance) in calls to `rand`, as since Julia 1.3, using the default implicit RNG has a measurable overhead (due to having to be thread-safe).

1 Like

FYI: I great way to profile, with flame graphs, is GitHub - timholy/ProfileView.jl: Visualization of Julia profiling data

What version of Julia did you use?

I got 0.66 sec on Julia 1.0.5 but 0.195 sec for word_sim3. Julia 1.3 had some known rand speed regression, and that issue is still open.

I didn’t match your speed, your PC seems just faster (I could just match it with word_sim3, then best 0.233 sec on Julia 1.5, not as good as Julia 1.0.5).

FYI: I couldn’t generate integer random numbers with (not meant too, or just not possible yet?), but for floating-point should be faster:

For my Julia 1.5-DEV (while not latest nightly) I got similar speed on the original code to Julia 1.4.0 (while less penalty for the first run), but if anything slightly faster for Julia 1.4.0 (didn’t test 1.4.1):

0.416916 seconds (798.10 k allocations: 164.778 MiB, 1.80% gc time)

for first run: 0.651927 seconds (865.73 k allocations: 168.292 MiB, 25.31% gc time)

For word_sim3:
0.233139 seconds (498.10 k allocations: 222.762 MiB, 2.88% gc time)

Your first problem is that you’re benchmarking in global scope with non-`const` globals. Make sure to wrap your code in a function and benchmark that using the BenchmarkTools package.

As for the `tests` and `innovates` vectors, you don’t need to allocate them anywhere, inside or outside the function. Just write `if mu > rand()` each time. No need for preallocation.

As a matter of style, I think you should use vectors instead of matrices, that is,

``````words = zeros(Int32, omega)
``````

unless there’s some particular reason for that choice.

5 Likes

The performance improvements on `rand` have not yet been merged into master.

1 Like

While I took a look at the, not yet merged, PR for the algorithm (and didn’t actually locate any division used…):

The currently used algorithm understandably has differing speed depending on the exact range used [EDIT: thanks to @rfourquet for \$rng tip that makes my way only relatively stronger]:

``````rng = MersenneTwister();

julia> @btime rand(\$rng, 0:2)
11.120 ns (0 allocations: 0 bytes)

julia> @btime rand(\$rng, 1:3)
11.108 ns (0 allocations: 0 bytes)

# before without \$rng:
15.286 ns (0 allocations: 0 bytes)

julia> @btime rand(\$rng, 0:3)
6.728 ns (0 allocations: 0 bytes)

julia> @btime rand(0:3)
10.464 ns (0 allocations: 0 bytes)
``````

I looked a bit into what’s currently possible without the algorithm, at least if you can handle bias, and got more than twice the speed for the same range:

``````julia> @btime (trunc(Int64, rand(\$rng, UInt16)*(1/3)))>>13
5.243 ns (0 allocations: 0 bytes)

julia> @btime 1+(trunc(Int64, rand(\$rng, UInt16)*(1/3)))>>13
5.607 ns (0 allocations: 0 bytes)

# without \$rng:
8.968 ns (0 allocations: 0 bytes)

julia> rA = collect([1+(trunc(Int64, rand(UInt16)*(1/3)))>>13 for i in 1:10000000]);

# However, no bias for e.g. 1:2 or 1:4 range, and still a bit faster:
julia> rB = collect([1+(trunc(Int64, rand(UInt16)*(1/2)))>>(14) for i in 1:10000000]);
julia> rC = collect([1+(rand(UInt16)>>(8+6)) for i in 1:10000000]);

julia> count(i->i==3, rA)
2501880

julia> count(i->i==2, rA)
3749129

julia> count(i->i==1, rA)
3748991
``````

I missed this:

``````seed = [0:1:(omega-1);];
``````

Don’t do this, do

``````seed = 0:(omega-1)
``````

Collecting ranges into vectors is something you should almost never do. This is actually my pet peeve

4 Likes

Frankly, I don’t think you need to create even the `words` array. You just need to repeatedly do a random test, check if it satisfies the probability of getting `omega`, and then recalculate the probability of getting `omega` the next time. Then keep track of the count. I don’t think you need any vector at all.

1 Like

If you are interested in previous discussions, you can have a look at https://github.com/JuliaLang/julia/issues/20582 and rand(1:10) vs Int(round(10*rand()) (and maybe https://github.com/JuliaLang/julia/pull/28987 which implements a biased `rand(a:b)`).

Also, I don’t know which version of julia you are using, but on 1.3 and more, I will repeat myself by recommending to use an explicit RNG when you benchmark, if what you want to measure is the performance of the algorithm to sample from a range, i.e. `@btime rand(\$rng, 1:3)` and not `@btime rand(1:3)` where `rng = MersenneTwister();`.

1 Like

Thanks, my rand method (in other comment) is a bit faster, as it’s without floating point, and sometimes (but sometimes not) unbiased. I however took that other claimed always unbiased version and made the fastest variant yet (on my machine);

``````rng = MersenneTwister();

julia> @btime begin
[..]
pops[jj] = word_sim4(\$rng, tau, omega, mu);
[..]
189.741 ms (500003 allocations: 222.79 MiB)

# I changed one line (two with the first line, I wasn't sure how
# to do otherwise) from word_sim3 while I doubt this pre-allocated
# rand array scheme will be fastest in the future

julia> function word_sim4(rng, tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega

words = Array{Int32}(undef, omega) # to store the words
tests = rand(omega) # will compare mu to these
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
rand0 = [1 + floor(Int, rand(rng)*i) for i=1:omega-1] # precalc. random numbers
@inbounds for i = 2:tau # simulate the process
if mu > tests[i] # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
#words[i] = words[rand(1:(i-1))]
words[i] = words[rand0[i-1]]
end
end
# force the word we're interested in
@inbounds for i = (tau+2):omega
if mu > tests[i] # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
#words[i] = words[rand(1:(i-1))]
words[i] = words[rand0[i-1]]
end
end
result = count(==(omega + 1), words) # count how many times our word occurred
return result
end
word_sim4 (generic function with 1 method)
``````
1 Like