Best way to optimize code

I haven’t checked this really well, but I think it should be close to correct:

function word_sim_alternative(τ, ω, μ)
    wordcount = 1
    for i in (τ+2):ω
        wordcount += (i * rand()) < (1-μ) * wordcount
    end
    return wordcount
end

This is fantastic! I didn’t realise random number generation can effect speed so much.

This is a really neat function. At some points I will need the entire array of words but in general this should be good for most of them. Thanks so much.

Still, my latest change to the program, with the help of that RNG package, more that doubles [EDIT: almost triples, if I recall old number] the speed (what is it on your faster machine?) on top of the best so far, and I think could be made better by avoiding converting (from int to float, in that package, and back) to int:

julia> using BenchmarkTools, ProfileView, VectorizedRNG  # note,
# version v0.1.6, master version is broken, so use the released one
[ Info: Precompiling VectorizedRNG [33b4df10-0173-11e9-2a0c-851a7edac40e]

rng = local_rng() 

julia> @btime begin
[..]
               pops[jj] = word_sim5(tau, omega, mu);
[..]
  63.522 ms (300002 allocations: 218.21 MiB)

@profview begin ... now shows the bottleneck moved

For:

julia> function word_sim5(tau::Int, omega::Int, mu::Float64)
                  # inserts a word in position (tau+1), at each point creates a new word with prob mu
                  # otherwise randomly chooses a previously used. Runs the program until time omega
                  words = Array{Int32}(undef, omega) # to store the words
                  tests = rand(local_rng(), omega) # will compare mu to these
                  words[1] = 1; # initialize the words
                  next_word = 2 # will be the next word used
                  words[tau+1] = omega + 1; # max possible word so insert that at time tau
                  rand0 = Vector{Float64}(undef, omega) # one extra element, by design
                  rand!(local_rng(), rand0)  # precalc. float random numbers
                  @inbounds for i = 2:tau # simulate the process
                      if mu > tests[i] # innovate 
                          words[i] = next_word
                          next_word = next_word + 1
                      else # copy
                          words[i] = words[1 + floor(Int, rand0[i])]
                      end
                  end
                  # force the word we're interested in
                  @inbounds for i = (tau+2):omega
                      if mu > tests[i] # innovate 
                          words[i] = next_word
                          next_word = next_word + 1
                      else # copy
                          words[i] = words[1 + floor(Int, rand0[i])]
                      end
                  end
                  result = count(==(omega + 1), words) # count how many times our word occurred
                  return result
              end
1 Like

If that happens, it might be enough to just store the location indices of that word, and you still wouldn’t need those locations for the assignment of the next word.

Great!

2 Likes

Am I not reading the profile right?

By now the single slowest line is “word_sim5: line 4”, about 20%+ (but three rand lines combined, 5, 9 and 16, about 75% of the time).

Do the numbers for that line make sense (it seems already fast, and with “undef” should be O(1), as not initializing):

julia> @time words = Array{Int32}(undef, omega);
  0.000003 seconds (1 allocation: 496 bytes)

julia> omega = 100000
100000

julia> @time words = Array{Int32}(undef, omega);
  0.000008 seconds (2 allocations: 390.703 KiB)

julia> omega = 1000000000
1000000000

julia> @time words = Array{Int32}(undef, omega);
  0.012678 seconds (2 allocations: 3.725 GiB, 99.61% gc time)

julia> @time words = Array{Int32}(undef, omega);
  0.004373 seconds (2 allocations: 3.725 GiB, 99.06% gc time)

julia> @time words = Array{Int32}(undef, omega);
  0.047590 seconds (2 allocations: 3.725 GiB, 99.92% gc time)

For those extra test I did (unrelated to the program, really), I’m exercising the GC [with @btime I do see the same allocations, but “gc time” is gone, as it takes care of avoiding GC], but why? It seems I’m testing Clibs malloc, and it’s not O(1), even then why GC?

[Funny story, I have 128 GB in mem. so the computer is not swapping, I thought I had 12 GB, and was told, “that can’t be right, we didn’t give you only 12 GB”, I had never seen the + before in top so disregarded a digit cut off:

KiB Mem : 13182060+total, 10671788+free, 14977804 used, 10124916 buff/cache
KiB Swap: 2097148 total, 2097148 free, 0 used. 11454323+avail Mem ]

Wow, I’m hitting .9 seconds with this version (down from .26) really impressive. I’ll need to look into the VectorizedRNG package (does it have any of the previously mentioned bias issues for example)?

There’s an open issue for integer support. I’ll try and get around to adding it soon.

I don’t think so, but note I wasn’t using it directly, I was however using a claimed non-biased workaround for speed of ranges. That package is based on the work here (what I recommended for new RNG in Julia, as MersenneTwister() is known to be slower): http://prng.di.unimi.it/

Yes, I added that issue, after looking into optimizing the program in this thread. I could make a PR based on the work I did here, to make simplifying the program here possible. But doing the int->float (your packages does) and back to int, wouldn’t be the best way, while could be optimized later. Then however you would get a different stream of random numbers. I guess you have no guarantee, so that’s ok.

Other package that DOES work for integers, and is faster but last I checked surprisingly wasn’t faster than Julia Base:

Sorry, sonuru’s package is actually very fast, faster than Base. I made a timing mistake. I’m close the my best version here with that package, and only older version uses vectorization. I’m doing:

julia> using RandomNumbers.Xorshifts

julia> r = Xoroshiro128Plus();

And I’m not yet using, now merged in Julia 1.6-DEV:

1 Like