How to optimize the following code?

I recently watched this video which ended in a comparison between Python+Numba vs Julia (starting around 1h:06m):


The author used multithreading to run the same function 10_000 times and concluded that Julia was slower than Numba.

I was wondering if the code could be optimized. Some of the lines could be deleted altogether, but for the sake of comparison with Numba, I kept them.

# mass spring damper

function friction(v::Float64, vt::Float64)
    v > vt ? -3v : -3vt*sign(v)
    end
end

function simulate(x0::Float64; T = 10.0, dt = 0.0001, vt = 1.0)
    times = 0.0:dt:T
    positions = zeros(length(times))
    v = 0.0
    a = 0.0
    x = x0
    positions[1] = x0/x0
    for ii in 1:length(times)
        if ii == 0
            continue
        end
        t = times[ii]
        a = friction(v, vt) - 100.0*x
        v = v + a * dt
        x = x + v * dt
        positions[ii] = x/x0
    end
    (times, positions)
end

@time simulate(10.0);
using Plots
plot(simulate(10.0)...)

function rep(z)
    Threads.@threads for i = 1:z
        x = simulate(10.0);
    end
end

@time rep(1)
@time rep(10_000)
2 Likes

Nice video. This gives a good perspective on beginner’s issues.

You already did the obvious thing: Replacing [0.0:dt:T;] with the range 0.0:dt:T makes the timing go down from 3.1s to 2.2s on my machine.

The function itself looks good to me (although it does some useless work). Make sure you start julia with highest performance optimizations -O3. Numba is pretty good, it will be hard to beat that.

2 Likes

Turn off bounds checking with @inbounds and throw @inline on the friction term? Probably calculate the friction with ifelse.

2 Likes

@ChrisRackauckas shouldn’t `friction be short enough for automatic inlining?

It should be. Should.

  • Replacing [0.0:dt:T;] with the range 0.0:dt:T reduces time by ~30%

  • Using @fastmath reduces time by ~30%

  • The way you benchmark the code is not reliable. Use BenchmarkTools.

  • Using @inbounds may have an effect too. Not much visible by these parameters.

  • Micro optimizations are possible by removing unused sentences and if check

using BenchmarkTools
@btime simulate_optimized(10.0);
  577.200 μs (3 allocations: 781.45 KiB)

@btime simulate(10.0);
  1.086 ms (5 allocations: 1.53 MiB)

The function:

function simulate_optimized(x0::Float64; T = 10.0, dt = 0.0001, vt = 1.0)
    times = 0.0:dt:T
    positions = zeros(length(times))
    v = 0.0
    a = 0.0
    x = x0
    positions[1] = 1.0 #x0/x0
    for ii in 1:length(times)

        ## not needed!
        # if ii === 0
        #     continue
        # end

        @fastmath begin
            @inbounds begin
                # t = times[ii] #? not used
                a = friction(v, vt) - 100.0*x
                v = v + a * dt
                x = x + v * dt
                positions[ii] = x/x0
            end
        end
    end
    (times, positions)
end
2 Likes

Awesome! Thank you all for the nice suggestions!

I’m the author of the video and would like to clarify one potential point of miscommunication: the intent of the video was not to compare Julia with python/numba. The only reason the comparison happened was because one of my subscribers had asked for it.

With that out of the way, a big thanks to everyone that has responded with suggestions for improvements. This bit of code was written in hour 3 of me becoming familiar with Julia so it’s very illuminating to see what I missed.

8 Likes