How to optimise and be faster than Java?

Sukera · September 3, 2018, 9:30pm

In general yes, having the type conversion is probably not going to hurt things because of genericity of other variables - but in this case, where everything else is already known to be a Float and a Float is guaranteed to come out since there’s a division there and all the things coming in are already Floats, why not just write that literal and save that conversion? If you want to be absolutely perfect about that literal, one(eltype(x)) would be the best option imo.

DNF · September 3, 2018, 9:54pm

My impression is that you don’t really save the conversion, that when it comes down to it, it’s optimized away. Or, if not, then it should be, at least. Certainly, the performance difference is not measurable.

The gain is more generic code. Very often, people assume too much, and add too many type constraints. I always want to encourage people to move away from that mindset.

And, it leads to nicer, prettier, more readable code.

Skoffer · September 3, 2018, 10:22pm

Since your function is rather fast, you should make a lot of runs for profiler to get statistics. So it’s better to use something like

Profile.clear()
@profile  (for i = 1:100_000; calcprojsimdi(pa, N); end)
Profile.print()

I’ve tried it with first version of your code and it looks like most of the time is spent in pow function, so most time consuming functions are csi_MonthlyInterpolatedSpot and the like.

On my machine, I have following results:

function csi_MonthlyInterpolatedSpot1(s::Proj, N::Int32)
    s.MonthlyInterpolatedSpot[1] = zero(Float64)
    for T = 2:N
        s.MonthlyInterpolatedSpot[T] = (1 / s.MonthlyInterpolatedZCB[T]) ^ (12.0 / (T-1)) - 1
    end
end

@btime csi_MonthlyInterpolatedSpot1($pa, $N)
# 12.611 μs (0 allocations: 0 bytes)

function csi_MonthlyInterpolatedSpot2(s::Proj, N::Int32)
    s.MonthlyInterpolatedSpot[1] = zero(Float64)
    for T = 2:N
        s.MonthlyInterpolatedSpot[T] = (1 / s.MonthlyInterpolatedZCB[T])
    end
end

@btime csi_MonthlyInterpolatedSpot2($pa, $N)
# 1.083 μs (0 allocations: 0 bytes)

So it would be interesting to compare speed of pow in Java and Julia to check whether is it true that java version is faster.

Elrod · September 3, 2018, 11:37pm

I’d try adding @inbounds to both, and getting rid of the inversion in the first (make the exponent negative), to see if those help.

platawiec · September 4, 2018, 12:21am

If pow is the limiting factor, then you can look at using VML or Vectorize to get a faster pow function. But even @inbounds or a broadcasted calculation is a bit faster than what’s shown. Toy example:

function pow_loop(a, b, N)
    a[1] = zero(eltype(a))
    for T = 2:N
        a[T] = (1/b[T])^(12/(T-1)) - 1
    end
    return a
end

function pow_loop_inv_inb(a, b, N)
    a[1] = zero(eltype(a))
    @inbounds for T = 2:N
        a[T] = (b[T])^(-12/(T-1)) - 1
    end
    return a
end

function pow_loop_vector(a, b, N)
    a[1] = zero(eltype(a))
    @views a[2:N] .= b[2:N].^(-12 ./ ((2:N).-1)) .- 1
    return a
end

using Vectorize # this will only work with Intel's VML library
function pow_loop_vml(a, b, N)
    Vectorize.pow!(a, b, (-12 ./ ((1:N) .- 1)))
    a .-= 1
    a[1] = zero(eltype(a))
    return a
end

Results:

a = rand(125); b = rand(125);

@btime pow_loop($a, $b, 125);
#  4.403 μs (0 allocations: 0 bytes)

@btime pow_loop_inv_inb($a, $b, 125);
#  2.508 μs (0 allocations: 0 bytes)

@btime pow_loop_vector($a, $b, 125);
#  2.703 μs (2 allocations: 96 bytes)

@btime pow_loop_vml($a, $b, 125);
#  1.826 μs (1 allocation: 1.06 KiB)

Topic		Replies	Views
Optimizing Calculation in Julia compared to C (New to Julia) Performance	25	2529	January 1, 2020
Speeding up julia New to Julia	15	5303	July 12, 2017
My Julia code is slower than Python and Matlab Performance question	26	1186	March 4, 2025
"Very optimised" Julia code 50x slower than Python with compiled functions General Usage	3	599	March 28, 2023
Julia is slower than Python when appending elements to untyped arrays General Usage question	42	900	February 10, 2025

How to optimise and be faster than Java?

Related topics