Best way to optimize code

DNF · April 22, 2020, 6:22pm

I haven’t checked this really well, but I think it should be close to correct:

function word_sim_alternative(τ, ω, μ)
    wordcount = 1
    for i in (τ+2):ω
        wordcount += (i * rand()) < (1-μ) * wordcount
    end
    return wordcount
end

jmaths · April 22, 2020, 7:43pm

This is fantastic! I didn’t realise random number generation can effect speed so much.

jmaths · April 22, 2020, 7:44pm

This is a really neat function. At some points I will need the entire array of words but in general this should be good for most of them. Thanks so much.

Palli · April 22, 2020, 10:30pm

Still, my latest change to the program, with the help of that RNG package, more that doubles [EDIT: almost triples, if I recall old number] the speed (what is it on your faster machine?) on top of the best so far, and I think could be made better by avoiding converting (from int to float, in that package, and back) to int:

julia> using BenchmarkTools, ProfileView, VectorizedRNG  # note,
# version v0.1.6, master version is broken, so use the released one
[ Info: Precompiling VectorizedRNG [33b4df10-0173-11e9-2a0c-851a7edac40e]

rng = local_rng() 

julia> @btime begin
[..]
               pops[jj] = word_sim5(tau, omega, mu);
[..]
  63.522 ms (300002 allocations: 218.21 MiB)

@profview begin ... now shows the bottleneck moved

For:

julia> function word_sim5(tau::Int, omega::Int, mu::Float64)
                  # inserts a word in position (tau+1), at each point creates a new word with prob mu
                  # otherwise randomly chooses a previously used. Runs the program until time omega
                  words = Array{Int32}(undef, omega) # to store the words
                  tests = rand(local_rng(), omega) # will compare mu to these
                  words[1] = 1; # initialize the words
                  next_word = 2 # will be the next word used
                  words[tau+1] = omega + 1; # max possible word so insert that at time tau
                  rand0 = Vector{Float64}(undef, omega) # one extra element, by design
                  rand!(local_rng(), rand0)  # precalc. float random numbers
                  @inbounds for i = 2:tau # simulate the process
                      if mu > tests[i] # innovate 
                          words[i] = next_word
                          next_word = next_word + 1
                      else # copy
                          words[i] = words[1 + floor(Int, rand0[i])]
                      end
                  end
                  # force the word we're interested in
                  @inbounds for i = (tau+2):omega
                      if mu > tests[i] # innovate 
                          words[i] = next_word
                          next_word = next_word + 1
                      else # copy
                          words[i] = words[1 + floor(Int, rand0[i])]
                      end
                  end
                  result = count(==(omega + 1), words) # count how many times our word occurred
                  return result
              end

DNF · April 22, 2020, 10:39pm

If that happens, it might be enough to just store the location indices of that word, and you still wouldn’t need those locations for the assignment of the next word.

Great!

Palli · April 22, 2020, 11:30pm

Am I not reading the profile right?

By now the single slowest line is “word_sim5: line 4”, about 20%+ (but three rand lines combined, 5, 9 and 16, about 75% of the time).

Do the numbers for that line make sense (it seems already fast, and with “undef” should be O(1), as not initializing):

julia> @time words = Array{Int32}(undef, omega);
  0.000003 seconds (1 allocation: 496 bytes)

julia> omega = 100000
100000

julia> @time words = Array{Int32}(undef, omega);
  0.000008 seconds (2 allocations: 390.703 KiB)

julia> omega = 1000000000
1000000000

julia> @time words = Array{Int32}(undef, omega);
  0.012678 seconds (2 allocations: 3.725 GiB, 99.61% gc time)

julia> @time words = Array{Int32}(undef, omega);
  0.004373 seconds (2 allocations: 3.725 GiB, 99.06% gc time)

julia> @time words = Array{Int32}(undef, omega);
  0.047590 seconds (2 allocations: 3.725 GiB, 99.92% gc time)

For those extra test I did (unrelated to the program, really), I’m exercising the GC [with @btime I do see the same allocations, but “gc time” is gone, as it takes care of avoiding GC], but why? It seems I’m testing Clibs malloc, and it’s not O(1), even then why GC?

[Funny story, I have 128 GB in mem. so the computer is not swapping, I thought I had 12 GB, and was told, “that can’t be right, we didn’t give you only 12 GB”, I had never seen the + before in top so disregarded a digit cut off:

KiB Mem : 13182060+total, 10671788+free, 14977804 used, 10124916 buff/cache
KiB Swap: 2097148 total, 2097148 free, 0 used. 11454323+avail Mem ]

jmaths · April 23, 2020, 10:21pm

Wow, I’m hitting .9 seconds with this version (down from .26) really impressive. I’ll need to look into the VectorizedRNG package (does it have any of the previously mentioned bias issues for example)?

Elrod · April 24, 2020, 2:36am

There’s an open issue for integer support. I’ll try and get around to adding it soon.

Palli · April 24, 2020, 12:28pm

I don’t think so, but note I wasn’t using it directly, I was however using a claimed non-biased workaround for speed of ranges. That package is based on the work here (what I recommended for new RNG in Julia, as MersenneTwister() is known to be slower): http://prng.di.unimi.it/

Yes, I added that issue, after looking into optimizing the program in this thread. I could make a PR based on the work I did here, to make simplifying the program here possible. But doing the int->float (your packages does) and back to int, wouldn’t be the best way, while could be optimized later. Then however you would get a different stream of random numbers. I guess you have no guarantee, so that’s ok.

Other package that DOES work for integers, and is faster ~~but last I checked surprisingly wasn’t faster~~ than Julia Base:

Palli · May 21, 2020, 11:04pm

Sorry, sonuru’s package is actually very fast, faster than Base. I made a timing mistake. I’m close the my best version here with that package, and only older version uses vectorization. I’m doing:

julia> using RandomNumbers.Xorshifts

julia> r = Xoroshiro128Plus();

And I’m not yet using, now merged in Julia 1.6-DEV:

github.com/JuliaLang/julia

implement "nearly division less" algorithm for rand(a:b)

JuliaLang:master ← JuliaLang:rf/rand/ndl

opened 10:14AM - 18 Sep 18 UTC

rfourquet

+89 -57

Cf. https://arxiv.org/abs/1805.10941. Closes #20582, #29004. This is still a…bout 30 percent slower than the "biased" solution (e.g. #28987), but as fast as the "best case" of the current version (when the range length is a power of 2). So this is strictly better, and should become the default version in a future (major) release. How do we make this available in the meantime? `rand(Random.fast(a:b))` is what the PR currently implements.

Topic		Replies	Views
Optimizing Calculation in Julia compared to C (New to Julia) Performance	25	2529	January 1, 2020
Porting code from MatLab - performance tips New to Julia	18	422	June 26, 2024
Optimizing Linear Algebra Code? Performance linearalgebra	12	1403	April 9, 2021
Improving performance of mapping algorithm Performance question	17	914	April 28, 2020
How to speed up a slow optimization routine (slower than Python currently) New to Julia	8	684	December 23, 2022

Best way to optimize code

Related topics