Inconsistent performance Julia 1.6 vs 1.5 on Linux

Erhan · August 5, 2021, 12:59pm

I am trying to switch to Julia from Matlab+Fortran, and I am a bit frustrated with the performance inconsistency across versions. I used profiler to isolate the lines in my code that takes longer than others. I have the following example code, which is a very common equation in Economics. Often used in a loop for solution and it is usually a part of contraction mapping. Of course, this is just an example, there are lots of other stuff inside the loop in the actual code.

Anyhow, when I time it, it takes 0.20 seconds on Julia 1.5.x and 1.4.x, but 0.90 seconds on 1.6.x and 1.7.x on various Ubuntu Linux flavors. (20.04, 21.04, etc). It is more than 4 times slower with newer versions of Julia on Linux. The same code takes about 0.40 seconds consistently on various versions in Windows.

Just for comparison, it takes about 0.20 seconds on Matlab (windows). Julia is just as fast as Matlab with this vectorized operation, but only with versions < 1.5.x and Linux. I am using the same computer (AMD with 8 cores and 48gb RAM), do not time the first run so that compilation time is not included, etc.

I am a bit hesitant to switch because I probably can’t keep using version 1.5.x forever.

I would appreciate any thoughts. Thanks a lot.

Example code:


function rtest4()

    N1=62
    N2=100
    theta=5.0
    A=rand(Float64,N2,N1)
    B=rand(Float64,N2,N1)

for ii=1:2000
        C = ( sum( ( B.^theta).*A, dims = 2) ).^(1.0./theta)
end

end

pdeffebach · August 5, 2021, 1:11pm

I can reproduce this, with an even larger performance difference. Hopefully someone more informed can chime in.

Here is a more streamlined function showing the performance difference

julia> function rtest4()
           rng = MersenneTwister(1234)
           N1=62
           N2=100
           theta=5.0
           A=rand(rng,N2,N1)
           B=rand(rng,N2,N1)
           C = (sum( ( B.^theta).*A, dims = 2) ).^(1.0./theta)
       end;

On 1.6:

julia> @btime rtest4();
  601.963 μs (24 allocations: 166.95 KiB)

On 1.5

julia> @btime rtest4();
  150.233 μs (26 allocations: 167.03 KiB)

It’s definitely the C = ... line, otherwise the difference in times wouldn’t scale as much as it does.

1.5 version infor:

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

1.6 version info

julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

nilshg · August 5, 2021, 1:24pm

I’m seeing:

julia> @btime rtest4()
  423.483 ms (16004 allocations: 98.36 MiB)

julia> versioninfo()
Julia Version 1.7.0-beta3.0

and

julia> @btime rtest4()
1.086 s (16004 allocations: 98.42 MiB)

julia> versioninfo()
Julia Version 1.5.3

on Windows 10, Intel i7.

Oscar_Smith · August 5, 2021, 1:31pm

Can you run the following and report results?

function rtest5()
           rng = MersenneTwister(1234)
           A=rand(rng,1024)
           C = A.^3.456532
       end;

If this is a regression, the problem is a known problem with scalar pow causes by an accidental change of libm version (which is why this isn’t reproducibile on all OSes).

pdeffebach · August 5, 2021, 1:35pm

The performance difference exists for me with that function.

Oscar_Smith · August 5, 2021, 1:43pm

@Erhan sorry about the regression. This is why we try not to rely on libm for things. The bad news is I didn’t have the time to fix this for 1.7. the good news is that if you can accept a small amount of inaccuracy, you can fix this for now with Base.(^)(x,y) = exp2(y * log2(x))

Erhan · August 5, 2021, 2:10pm

Thanks a lot, yes, same problem here. 1.5.x is about 4 to 5 times faster than 1.6.x.

Erhan · August 5, 2021, 2:15pm

Thanks a lot for pointing out the problem, and the solution. Indeed, if I use theta=1.0, there is no difference in performance. It is great to know that you are aware of the problem and it will be fixed in the future. There are a lot of things I like about Julia.

nilshg · August 5, 2021, 2:18pm

I might be off here, but when I see something like your code with sum(..., dims = 2) and matrix multiplications I feel there’s a Tullio/LoopVectorization solution with potential large speedups around the corner, so might be worth looking into that if this is really the bottleneck in your code.

Erhan · August 5, 2021, 2:30pm

Thanks for the tip. Yes, it is probably not the most efficient way to do it. It was just copied and pasted from Matlab with minor syntax modifications.

Erhan · August 9, 2021, 12:30pm

Just in case if some newbie, like myself, tries to implement the solution without giving enough thought. I think overriding works better if we declare the types of inputs explicitly since each function might have multiple versions for different types of inputs, like this Base.:^(x::Float64,y::Float64) = exp2(y * log2(x)) I like that we do not need to define a separate version that broadcasts, since we can broadcast on any function with x.^y or like a regular function (^).(x,y).

Topic		Replies	Views
Julia is significantly slower (~18 x) than Matlab in vector and matrix algebra New to Julia	32	1909	June 25, 2023
Matlab versus Julia General Usage	33	4947	July 15, 2021
Arithmetic broadcasting in Julia 5x slower than MATLAB Performance	17	1071	May 26, 2022
MATLAB outperforms Julia (20 times faster) running this nested loop Performance question , performance , matlab , time , finitediff	20	5404	January 10, 2023
Julia 1.6 versus 1.5.4 General Usage	8	1174	March 26, 2021

Inconsistent performance Julia 1.6 vs 1.5 on Linux

Related topics