Best way to optimize code

jmaths · April 21, 2020, 6:10pm

I’m making my first effort to move from Matlab to Julia and have found my code to improve by ~3x but still think there is more to come, I’m not using any global variables in the function and have preallocated all the arrays used (I think?). If there was any thoughts on how it could be sped up even further it would be greatly appreciated, I’ll fully convert even at the current improvement I think!

function word_sim(tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega

words = zeros(Int32, 1, omega) # to store the words
tests = rand(1,omega) # will compare mu to these
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
innovates = mu .> tests; # when we'll make a new word
for i = 2:tau # simulate the process
    if innovates[i] == 1 # innovate 
        words[i] = next_word
        next_word = next_word + 1
    else # copy
        words[i] = words[rand(1:(i-1))]
    end
end
# force the word we're interested in
for i = (tau+2):omega
    if innovates[i] == 1 # innovate 
        words[i] = next_word
        next_word = next_word + 1
    else # copy
        words[i] = words[rand(1:(i-1))]
    end
end
result = sum(words .== (omega + 1)); # count how many times our word occurred
return result
end

and when I run it with these values it takes ~.26 seconds on my PC

using Statistics
@time begin
nsim = 10^3;
omega = 100;
seed = [0:1:(omega-1);]; 
mu = 0.01; 

results = zeros(Float64, 1, length(seed));
pops = zeros(Int64, 1, nsim);
for tau in seed
    for jj = 1:nsim
        pops[jj] = word_sim(tau, omega, mu);
    end
    results[tau+1] = mean(pops);
end
end

Or perhaps I’d be better writing the code in C++? Julia was my first reaction as I’ve heard rave reviews about its syntax, which to be honest is fantastic!

Any comments greatly appreciated.

simeonschaub · April 21, 2020, 6:29pm

You could replace sum(words .== (omega + 1)) with count(==(omega + 1), words), which shaves another 40 ms off for me. For benchmarking, you might want to replace @time with @btime from BenchmarkTools.jl, which will exclude compilation time and build an average. Otherwise your code looks pretty good, I don’t see much more obvious optimizations left on the table, but am glad to be proven wrong.

simeonschaub · April 21, 2020, 6:33pm

I would say just if innovates[i] would be slightly better style than if innovates[i] == 1 since it’s an array of bools, but the performance implications are probably minimal.

Iulian.Cioarca · April 21, 2020, 6:35pm

You could cut the allocations even more(although I’m not sure you will gain much speed) by keeping seed as range ( no square brackets) and preallocate with Array{Int64,1)(undef,dim). Keep in mind the allocated array will contain gibberish;).

simeonschaub · April 21, 2020, 6:40pm

You get another 60 ms, if you don’t allocate innovates:

function word_sim2(tau::Int, omega::Int, mu::Float64)
    # inserts a word in position (tau+1), at each point creates a new word with prob mu
    # otherwise randomly chooses a previously used. Runs the program until time omega
    
    words = Array{Int32}(undef, omega) # to store the words
    tests = rand(omega) # will compare mu to these
    words[1] = 1; # initialize the words
    next_word = 2 # will be the next word used
    words[tau+1] = omega + 1; # max possible word so insert that at time tau
    for i = 2:tau # simulate the process
        if mu > tests[i] # innovate 
            words[i] = next_word
            next_word = next_word + 1
        else # copy
            words[i] = words[rand(1:(i-1))]
        end
    end
    # force the word we're interested in
    for i = (tau+2):omega
        if mu > tests[i] # innovate 
            words[i] = next_word
            next_word = next_word + 1
        else # copy
            words[i] = words[rand(1:(i-1))]
        end
    end
    result = count(==(omega + 1), words) # count how many times our word occurred
    return result
end

GunnarFarneback · April 21, 2020, 7:12pm

There is no need to allocate new words and tests arrays for each call, i.e. construct them before the test loops and send them in as arguments to word_sim. (tests can be filled with Random.rand!)

More radically the first of the two loops in word_sim has no effect on the output and can be removed.

You can try but chances are that you would get similar speed in Julia if you write the code in the same way.

jmaths · April 21, 2020, 8:05pm

Many thanks for all the helpful comments, they are greatly appreciated!

AndiMD · April 21, 2020, 9:33pm

Hi there! Seems like your function spends about 50% with the RNG. Precalculating the random numbers and using @inbounds removes another 25ms on my system (although we calculate the random number even when not needed).
Please verify if my indexing is correct (probably not). You can then get more performance by skipping the unneeded RNG operations (mu > tests[i] is known before the loop)


function word_sim3(tau::Int, omega::Int, mu::Float64)
    # inserts a word in position (tau+1), at each point creates a new word with prob mu
    # otherwise randomly chooses a previously used. Runs the program until time omega
    
    words = Array{Int32}(undef, omega) # to store the words
    tests = rand(omega) # will compare mu to these
    words[1] = 1; # initialize the words
    next_word = 2 # will be the next word used
    words[tau+1] = omega + 1; # max possible word so insert that at time tau
    rand0 = [rand(UInt32)%(i-1)+1 for i=2:omega] # precalc. random numbers
    @inbounds for i = 2:tau # simulate the process
        if mu > tests[i] # innovate 
            words[i] = next_word
            next_word = next_word + 1
        else # copy
            #words[i] = words[rand(1:(i-1))]
            words[i] = words[rand0[i-1]]
        end
    end
    # force the word we're interested in
    @inbounds for i = (tau+2):omega
        if mu > tests[i] # innovate 
            words[i] = next_word
            next_word = next_word + 1
        else # copy
            #words[i] = words[rand(1:(i-1))]
            words[i] = words[rand0[i-1]]
        end
    end
    result = count(==(omega + 1), words) # count how many times our word occurred
    return result
end

ericphanson · April 21, 2020, 9:44pm

My understanding is the default MersenneTwister RNG is more efficient if you use the rand(T, n) methods which uses the dSFMT algorithm (rather than something like [rand(T) for _ = 1:n]). So here it could be even faster to do rand0 = rand(UInt32, omega) and then do the % stuff after (either right away or where it gets used).

Oscar_Smith · April 21, 2020, 10:06pm

It’s worth noting that 1.5 will likely have a faster randomness in general and random ranges in particular. See https://github.com/JuliaLang/julia/pull/34852 and implement "nearly division less" algorithm for rand(a:b) by rfourquet · Pull Request #29240 · JuliaLang/julia · GitHub

jmaths · April 22, 2020, 10:06am

This is very interesting, I hadn’t seen the inbounds functionality before. Really useful thanks!

rfourquet · April 22, 2020, 10:59am

This is very true (in particular for Float64), so I don’t see how allocating an array with [rand(T) for _ = 1:n] can lead to faster code than calling rand(T) on demand.

Something you might try is to pass explicitly an rng (MersenneTwister instance) in calls to rand, as since Julia 1.3, using the default implicit RNG has a measurable overhead (due to having to be thread-safe).

Palli · April 22, 2020, 12:31pm

FYI: I great way to profile, with flame graphs, is GitHub - timholy/ProfileView.jl: Visualization of Julia profiling data

What version of Julia did you use?

I got 0.66 sec on Julia 1.0.5 but 0.195 sec for word_sim3. Julia 1.3 had some known rand speed regression, and that issue is still open.

I didn’t match your speed, your PC seems just faster (I could just match it with word_sim3, then best 0.233 sec on Julia 1.5, not as good as Julia 1.0.5).

FYI: I couldn’t generate integer random numbers with (not meant too, or just not possible yet?), but for floating-point should be faster:

For my Julia 1.5-DEV (while not latest nightly) I got similar speed on the original code to Julia 1.4.0 (while less penalty for the first run), but if anything slightly faster for Julia 1.4.0 (didn’t test 1.4.1):

0.416916 seconds (798.10 k allocations: 164.778 MiB, 1.80% gc time)

for first run: 0.651927 seconds (865.73 k allocations: 168.292 MiB, 25.31% gc time)

For word_sim3:
0.233139 seconds (498.10 k allocations: 222.762 MiB, 2.88% gc time)

DNF · April 22, 2020, 1:39pm

Your first problem is that you’re benchmarking in global scope with non-const globals. Make sure to wrap your code in a function and benchmark that using the BenchmarkTools package.

As for the tests and innovates vectors, you don’t need to allocate them anywhere, inside or outside the function. Just write if mu > rand() each time. No need for preallocation.

As a matter of style, I think you should use vectors instead of matrices, that is,

words = zeros(Int32, omega)

instead of

unless there’s some particular reason for that choice.

rfourquet · April 22, 2020, 1:51pm

The performance improvements on rand have not yet been merged into master.

Palli · April 22, 2020, 2:27pm

While I took a look at the, not yet merged, PR for the algorithm (and didn’t actually locate any division used…):

github.com/JuliaLang/julia

implement "nearly division less" algorithm for rand(a:b)

JuliaLang:master ← JuliaLang:rf/rand/ndl

opened 10:14AM - 18 Sep 18 UTC

rfourquet

+89 -57

Cf. https://arxiv.org/abs/1805.10941. Closes #20582, #29004. This is still a…bout 30 percent slower than the "biased" solution (e.g. #28987), but as fast as the "best case" of the current version (when the range length is a power of 2). So this is strictly better, and should become the default version in a future (major) release. How do we make this available in the meantime? `rand(Random.fast(a:b))` is what the PR currently implements.

The currently used algorithm understandably has differing speed depending on the exact range used [EDIT: thanks to @rfourquet for $rng tip that makes my way only relatively stronger]:

rng = MersenneTwister();

julia> @btime rand($rng, 0:2)
  11.120 ns (0 allocations: 0 bytes)

julia> @btime rand($rng, 1:3)
  11.108 ns (0 allocations: 0 bytes)

# before without $rng:
15.286 ns (0 allocations: 0 bytes)

julia> @btime rand($rng, 0:3)
  6.728 ns (0 allocations: 0 bytes)

julia> @btime rand(0:3)
  10.464 ns (0 allocations: 0 bytes)

I looked a bit into what’s currently possible without the algorithm, at least if you can handle bias, and got more than twice the speed for the same range:

julia> @btime (trunc(Int64, rand($rng, UInt16)*(1/3)))>>13
  5.243 ns (0 allocations: 0 bytes)

julia> @btime 1+(trunc(Int64, rand($rng, UInt16)*(1/3)))>>13
  5.607 ns (0 allocations: 0 bytes)

# without $rng:
  8.968 ns (0 allocations: 0 bytes)


julia> rA = collect([1+(trunc(Int64, rand(UInt16)*(1/3)))>>13 for i in 1:10000000]);

# However, no bias for e.g. 1:2 or 1:4 range, and still a bit faster:
julia> rB = collect([1+(trunc(Int64, rand(UInt16)*(1/2)))>>(14) for i in 1:10000000]);
julia> rC = collect([1+(rand(UInt16)>>(8+6)) for i in 1:10000000]);


julia> count(i->i==3, rA)
2501880

julia> count(i->i==2, rA)
3749129

julia> count(i->i==1, rA)
3748991

DNF · April 22, 2020, 2:28pm

I missed this:

seed = [0:1:(omega-1);];

Don’t do this, do

seed = 0:(omega-1)

Collecting ranges into vectors is something you should almost never do. This is actually my pet peeve

DNF · April 22, 2020, 3:13pm

Frankly, I don’t think you need to create even the words array. You just need to repeatedly do a random test, check if it satisfies the probability of getting omega, and then recalculate the probability of getting omega the next time. Then keep track of the count. I don’t think you need any vector at all.

rfourquet · April 22, 2020, 3:31pm

If you are interested in previous discussions, you can have a look at random sampling from an (abstract)array is slow · Issue #20582 · JuliaLang/julia · GitHub and rand(1:10) vs Int(round(10*rand()) (and maybe RFC: implement a biased rand(a:b) (fix #20582) by rfourquet · Pull Request #28987 · JuliaLang/julia · GitHub which implements a biased rand(a:b)).

Also, I don’t know which version of julia you are using, but on 1.3 and more, I will repeat myself by recommending to use an explicit RNG when you benchmark, if what you want to measure is the performance of the algorithm to sample from a range, i.e. @btime rand($rng, 1:3) and not @btime rand(1:3) where rng = MersenneTwister();.

Palli · April 22, 2020, 5:21pm

Thanks, my rand method (in other comment) is a bit faster, as it’s without floating point, and sometimes (but sometimes not) unbiased. I however took that other claimed always unbiased version and made the fastest variant yet (on my machine);

rng = MersenneTwister();

julia> @btime begin
[..]
               pops[jj] = word_sim4($rng, tau, omega, mu);
[..]
  189.741 ms (500003 allocations: 222.79 MiB)


# I changed one line (two with the first line, I wasn't sure how
# to do otherwise) from word_sim3 while I doubt this pre-allocated
# rand array scheme will be fastest in the future

julia> function word_sim4(rng, tau::Int, omega::Int, mu::Float64)
           # inserts a word in position (tau+1), at each point creates a new word with prob mu
           # otherwise randomly chooses a previously used. Runs the program until time omega

           words = Array{Int32}(undef, omega) # to store the words
           tests = rand(omega) # will compare mu to these
           words[1] = 1; # initialize the words
           next_word = 2 # will be the next word used
           words[tau+1] = omega + 1; # max possible word so insert that at time tau
           rand0 = [1 + floor(Int, rand(rng)*i) for i=1:omega-1] # precalc. random numbers
           @inbounds for i = 2:tau # simulate the process
               if mu > tests[i] # innovate 
                   words[i] = next_word
                   next_word = next_word + 1
               else # copy
                   #words[i] = words[rand(1:(i-1))]
                   words[i] = words[rand0[i-1]]
               end
           end
           # force the word we're interested in
           @inbounds for i = (tau+2):omega
               if mu > tests[i] # innovate 
                   words[i] = next_word
                   next_word = next_word + 1
               else # copy
                   #words[i] = words[rand(1:(i-1))]
                   words[i] = words[rand0[i-1]]
               end
           end
           result = count(==(omega + 1), words) # count how many times our word occurred
           return result
       end
word_sim4 (generic function with 1 method)

Topic		Replies	Views
Performance of Julia as a general-purpose programming language Performance	30	7643	November 19, 2017
Quite bad performance of Julia 0.6.4 vs Python+Numpy General Usage	26	5202	November 13, 2018
Trivial port from C++/Rust to Julia Performance performance , benchmark , loopvectorization	51	3187	December 28, 2023
Julia programs now shown on benchmarks game website Community announcement	144	13728	December 3, 2019
Help to get my slow Julia code to run as fast as Rust/Java/Lisp Performance	100	4617	August 6, 2021

Best way to optimize code

Related topics