I’m making my first effort to move from Matlab to Julia and have found my code to improve by ~3x but still think there is more to come, I’m not using any global variables in the function and have preallocated all the arrays used (I think?). If there was any thoughts on how it could be sped up even further it would be greatly appreciated, I’ll fully convert even at the current improvement I think!
function word_sim(tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega
words = zeros(Int32, 1, omega) # to store the words
tests = rand(1,omega) # will compare mu to these
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
innovates = mu .> tests; # when we'll make a new word
for i = 2:tau # simulate the process
if innovates[i] == 1 # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
# force the word we're interested in
for i = (tau+2):omega
if innovates[i] == 1 # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
result = sum(words .== (omega + 1)); # count how many times our word occurred
return result
end
and when I run it with these values it takes ~.26 seconds on my PC
using Statistics
@time begin
nsim = 10^3;
omega = 100;
seed = [0:1:(omega-1);];
mu = 0.01;
results = zeros(Float64, 1, length(seed));
pops = zeros(Int64, 1, nsim);
for tau in seed
for jj = 1:nsim
pops[jj] = word_sim(tau, omega, mu);
end
results[tau+1] = mean(pops);
end
end
Or perhaps I’d be better writing the code in C++? Julia was my first reaction as I’ve heard rave reviews about its syntax, which to be honest is fantastic!
You could replace sum(words .== (omega + 1)) with count(==(omega + 1), words), which shaves another 40 ms off for me. For benchmarking, you might want to replace @time with @btime from BenchmarkTools.jl, which will exclude compilation time and build an average. Otherwise your code looks pretty good, I don’t see much more obvious optimizations left on the table, but am glad to be proven wrong.
I would say just if innovates[i] would be slightly better style than if innovates[i] == 1 since it’s an array of bools, but the performance implications are probably minimal.
You could cut the allocations even more(although I’m not sure you will gain much speed) by keeping seed as range ( no square brackets) and preallocate with Array{Int64,1)(undef,dim). Keep in mind the allocated array will contain gibberish;).
You get another 60 ms, if you don’t allocate innovates:
function word_sim2(tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega
words = Array{Int32}(undef, omega) # to store the words
tests = rand(omega) # will compare mu to these
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
for i = 2:tau # simulate the process
if mu > tests[i] # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
# force the word we're interested in
for i = (tau+2):omega
if mu > tests[i] # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
result = count(==(omega + 1), words) # count how many times our word occurred
return result
end
There is no need to allocate new words and tests arrays for each call, i.e. construct them before the test loops and send them in as arguments to word_sim. (tests can be filled with Random.rand!)
More radically the first of the two loops in word_sim has no effect on the output and can be removed.
You can try but chances are that you would get similar speed in Julia if you write the code in the same way.
Hi there! Seems like your function spends about 50% with the RNG. Precalculating the random numbers and using @inbounds removes another 25ms on my system (although we calculate the random number even when not needed).
Please verify if my indexing is correct (probably not). You can then get more performance by skipping the unneeded RNG operations (mu > tests[i] is known before the loop)
function word_sim3(tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega
words = Array{Int32}(undef, omega) # to store the words
tests = rand(omega) # will compare mu to these
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
rand0 = [rand(UInt32)%(i-1)+1 for i=2:omega] # precalc. random numbers
@inbounds for i = 2:tau # simulate the process
if mu > tests[i] # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
#words[i] = words[rand(1:(i-1))]
words[i] = words[rand0[i-1]]
end
end
# force the word we're interested in
@inbounds for i = (tau+2):omega
if mu > tests[i] # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
#words[i] = words[rand(1:(i-1))]
words[i] = words[rand0[i-1]]
end
end
result = count(==(omega + 1), words) # count how many times our word occurred
return result
end
My understanding is the default MersenneTwister RNG is more efficient if you use the rand(T, n) methods which uses the dSFMT algorithm (rather than something like [rand(T) for _ = 1:n]). So here it could be even faster to do rand0 = rand(UInt32, omega) and then do the % stuff after (either right away or where it gets used).
This is very true (in particular for Float64), so I don’t see how allocating an array with [rand(T) for _ = 1:n] can lead to faster code than calling rand(T) on demand.
Something you might try is to pass explicitly an rng (MersenneTwister instance) in calls to rand, as since Julia 1.3, using the default implicit RNG has a measurable overhead (due to having to be thread-safe).
I got 0.66 sec on Julia 1.0.5 but 0.195 sec for word_sim3. Julia 1.3 had some known rand speed regression, and that issue is still open.
I didn’t match your speed, your PC seems just faster (I could just match it with word_sim3, then best 0.233 sec on Julia 1.5, not as good as Julia 1.0.5).
FYI: I couldn’t generate integer random numbers with (not meant too, or just not possible yet?), but for floating-point should be faster:
For my Julia 1.5-DEV (while not latest nightly) I got similar speed on the original code to Julia 1.4.0 (while less penalty for the first run), but if anything slightly faster for Julia 1.4.0 (didn’t test 1.4.1):
0.416916 seconds (798.10 k allocations: 164.778 MiB, 1.80% gc time)
for first run: 0.651927 seconds (865.73 k allocations: 168.292 MiB, 25.31% gc time)
For word_sim3:
0.233139 seconds (498.10 k allocations: 222.762 MiB, 2.88% gc time)
Your first problem is that you’re benchmarking in global scope with non-const globals. Make sure to wrap your code in a function and benchmark that using the BenchmarkTools package.
As for the tests and innovates vectors, you don’t need to allocate them anywhere, inside or outside the function. Just write if mu > rand() each time. No need for preallocation.
As a matter of style, I think you should use vectors instead of matrices, that is,
words = zeros(Int32, omega)
instead of
unless there’s some particular reason for that choice.
While I took a look at the, not yet merged, PR for the algorithm (and didn’t actually locate any division used…):
The currently used algorithm understandably has differing speed depending on the exact range used [EDIT: thanks to @rfourquet for $rng tip that makes my way only relatively stronger]:
I looked a bit into what’s currently possible without the algorithm, at least if you can handle bias, and got more than twice the speed for the same range:
julia> @btime (trunc(Int64, rand($rng, UInt16)*(1/3)))>>13
5.243 ns (0 allocations: 0 bytes)
julia> @btime 1+(trunc(Int64, rand($rng, UInt16)*(1/3)))>>13
5.607 ns (0 allocations: 0 bytes)
# without $rng:
8.968 ns (0 allocations: 0 bytes)
julia> rA = collect([1+(trunc(Int64, rand(UInt16)*(1/3)))>>13 for i in 1:10000000]);
# However, no bias for e.g. 1:2 or 1:4 range, and still a bit faster:
julia> rB = collect([1+(trunc(Int64, rand(UInt16)*(1/2)))>>(14) for i in 1:10000000]);
julia> rC = collect([1+(rand(UInt16)>>(8+6)) for i in 1:10000000]);
julia> count(i->i==3, rA)
2501880
julia> count(i->i==2, rA)
3749129
julia> count(i->i==1, rA)
3748991
Frankly, I don’t think you need to create even the words array. You just need to repeatedly do a random test, check if it satisfies the probability of getting omega, and then recalculate the probability of getting omega the next time. Then keep track of the count. I don’t think you need any vector at all.
Also, I don’t know which version of julia you are using, but on 1.3 and more, I will repeat myself by recommending to use an explicit RNG when you benchmark, if what you want to measure is the performance of the algorithm to sample from a range, i.e. @btime rand($rng, 1:3) and not @btime rand(1:3) where rng = MersenneTwister();.
Thanks, my rand method (in other comment) is a bit faster, as it’s without floating point, and sometimes (but sometimes not) unbiased. I however took that other claimed always unbiased version and made the fastest variant yet (on my machine);
rng = MersenneTwister();
julia> @btime begin
[..]
pops[jj] = word_sim4($rng, tau, omega, mu);
[..]
189.741 ms (500003 allocations: 222.79 MiB)
# I changed one line (two with the first line, I wasn't sure how
# to do otherwise) from word_sim3 while I doubt this pre-allocated
# rand array scheme will be fastest in the future
julia> function word_sim4(rng, tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega
words = Array{Int32}(undef, omega) # to store the words
tests = rand(omega) # will compare mu to these
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
rand0 = [1 + floor(Int, rand(rng)*i) for i=1:omega-1] # precalc. random numbers
@inbounds for i = 2:tau # simulate the process
if mu > tests[i] # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
#words[i] = words[rand(1:(i-1))]
words[i] = words[rand0[i-1]]
end
end
# force the word we're interested in
@inbounds for i = (tau+2):omega
if mu > tests[i] # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
#words[i] = words[rand(1:(i-1))]
words[i] = words[rand0[i-1]]
end
end
result = count(==(omega + 1), words) # count how many times our word occurred
return result
end
word_sim4 (generic function with 1 method)