I am trying to switch to Julia from Matlab+Fortran, and I am a bit frustrated with the performance inconsistency across versions. I used profiler to isolate the lines in my code that takes longer than others. I have the following example code, which is a very common equation in Economics. Often used in a loop for solution and it is usually a part of contraction mapping. Of course, this is just an example, there are lots of other stuff inside the loop in the actual code.
Anyhow, when I time it, it takes 0.20 seconds on Julia 1.5.x and 1.4.x, but 0.90 seconds on 1.6.x and 1.7.x on various Ubuntu Linux flavors. (20.04, 21.04, etc). It is more than 4 times slower with newer versions of Julia on Linux. The same code takes about 0.40 seconds consistently on various versions in Windows.
Just for comparison, it takes about 0.20 seconds on Matlab (windows). Julia is just as fast as Matlab with this vectorized operation, but only with versions < 1.5.x and Linux. I am using the same computer (AMD with 8 cores and 48gb RAM), do not time the first run so that compilation time is not included, etc.
I am a bit hesitant to switch because I probably can’t keep using version 1.5.x forever.
I would appreciate any thoughts. Thanks a lot.
Example code:
function rtest4()
N1=62
N2=100
theta=5.0
A=rand(Float64,N2,N1)
B=rand(Float64,N2,N1)
for ii=1:2000
C = ( sum( ( B.^theta).*A, dims = 2) ).^(1.0./theta)
end
end
function rtest5()
rng = MersenneTwister(1234)
A=rand(rng,1024)
C = A.^3.456532
end;
If this is a regression, the problem is a known problem with scalar pow causes by an accidental change of libm version (which is why this isn’t reproducibile on all OSes).
@Erhan sorry about the regression. This is why we try not to rely on libm for things. The bad news is I didn’t have the time to fix this for 1.7. the good news is that if you can accept a small amount of inaccuracy, you can fix this for now with Base.(^)(x,y) = exp2(y * log2(x))
Thanks a lot for pointing out the problem, and the solution. Indeed, if I use theta=1.0, there is no difference in performance. It is great to know that you are aware of the problem and it will be fixed in the future. There are a lot of things I like about Julia.
I might be off here, but when I see something like your code with sum(..., dims = 2) and matrix multiplications I feel there’s a Tullio/LoopVectorization solution with potential large speedups around the corner, so might be worth looking into that if this is really the bottleneck in your code.
Thanks for the tip. Yes, it is probably not the most efficient way to do it. It was just copied and pasted from Matlab with minor syntax modifications.
Just in case if some newbie, like myself, tries to implement the solution without giving enough thought. I think overriding works better if we declare the types of inputs explicitly since each function might have multiple versions for different types of inputs, like this Base.:^(x::Float64,y::Float64) = exp2(y * log2(x)) I like that we do not need to define a separate version that broadcasts, since we can broadcast on any function with x.^y or like a regular function (^).(x,y).