In general yes, having the type conversion is probably not going to hurt things because of genericity of other variables - but in this case, where everything else is already known to be a Float and a Float is guaranteed to come out since there’s a division there and all the things coming in are already Floats, why not just write that literal and save that conversion? If you want to be absolutely perfect about that literal, one(eltype(x))
would be the best option imo.
My impression is that you don’t really save the conversion, that when it comes down to it, it’s optimized away. Or, if not, then it should be, at least. Certainly, the performance difference is not measurable.
The gain is more generic code. Very often, people assume too much, and add too many type constraints. I always want to encourage people to move away from that mindset.
And, it leads to nicer, prettier, more readable code.
Since your function is rather fast, you should make a lot of runs for profiler to get statistics. So it’s better to use something like
Profile.clear()
@profile (for i = 1:100_000; calcprojsimdi(pa, N); end)
Profile.print()
I’ve tried it with first version of your code and it looks like most of the time is spent in pow
function, so most time consuming functions are csi_MonthlyInterpolatedSpot
and the like.
On my machine, I have following results:
function csi_MonthlyInterpolatedSpot1(s::Proj, N::Int32)
s.MonthlyInterpolatedSpot[1] = zero(Float64)
for T = 2:N
s.MonthlyInterpolatedSpot[T] = (1 / s.MonthlyInterpolatedZCB[T]) ^ (12.0 / (T-1)) - 1
end
end
@btime csi_MonthlyInterpolatedSpot1($pa, $N)
# 12.611 μs (0 allocations: 0 bytes)
function csi_MonthlyInterpolatedSpot2(s::Proj, N::Int32)
s.MonthlyInterpolatedSpot[1] = zero(Float64)
for T = 2:N
s.MonthlyInterpolatedSpot[T] = (1 / s.MonthlyInterpolatedZCB[T])
end
end
@btime csi_MonthlyInterpolatedSpot2($pa, $N)
# 1.083 μs (0 allocations: 0 bytes)
So it would be interesting to compare speed of pow
in Java and Julia to check whether is it true that java version is faster.
I’d try adding @inbounds
to both, and getting rid of the inversion in the first (make the exponent negative), to see if those help.
If pow
is the limiting factor, then you can look at using VML
or Vectorize
to get a faster pow
function. But even @inbounds
or a broadcasted calculation is a bit faster than what’s shown. Toy example:
function pow_loop(a, b, N)
a[1] = zero(eltype(a))
for T = 2:N
a[T] = (1/b[T])^(12/(T-1)) - 1
end
return a
end
function pow_loop_inv_inb(a, b, N)
a[1] = zero(eltype(a))
@inbounds for T = 2:N
a[T] = (b[T])^(-12/(T-1)) - 1
end
return a
end
function pow_loop_vector(a, b, N)
a[1] = zero(eltype(a))
@views a[2:N] .= b[2:N].^(-12 ./ ((2:N).-1)) .- 1
return a
end
using Vectorize # this will only work with Intel's VML library
function pow_loop_vml(a, b, N)
Vectorize.pow!(a, b, (-12 ./ ((1:N) .- 1)))
a .-= 1
a[1] = zero(eltype(a))
return a
end
Results:
a = rand(125); b = rand(125);
@btime pow_loop($a, $b, 125);
# 4.403 μs (0 allocations: 0 bytes)
@btime pow_loop_inv_inb($a, $b, 125);
# 2.508 μs (0 allocations: 0 bytes)
@btime pow_loop_vector($a, $b, 125);
# 2.703 μs (2 allocations: 96 bytes)
@btime pow_loop_vml($a, $b, 125);
# 1.826 μs (1 allocation: 1.06 KiB)