I’m curious if there is a high level concept to find maximum element and index in mapped array, which has similar performance to raw loop. I’m comparing f4 and f5:
using BenchmarkTools, Random, StaticArrays, LinearAlgebra
points = rand(MersenneTwister(12345), SVector{2, Float32}, 10_000)
function f4(p)
total = 0.
det0 = det([p[1] p[end]])
d0 = p[end] - p[1]
(vmax, imax) = findmax(map(x -> abs(det([x d0]) + det0), p))
total += vmax + p[imax][1] + p[imax][2]
return total
end
function f5(p)
total = 0.
det0 = det([p[1] p[end]])
d0 = p[end] - p[1]
vmax = -Inf
imax = 0
for i in eachindex(p)
if abs(det([p[i] d0]) + det0) > vmax
vmax = abs(det([p[i] d0]) + det0)
imax = i
end
end
total += vmax + p[imax][1] + p[imax][2]
return total
end
println(f4(points))
println(f5(points))
println("$VERSION f4: $(minimum(@benchmark f4($points)))")
println("$VERSION f5: $(minimum(@benchmark f5($points)))")
That was intentional. This should be the easiest trick in compiler optimization book to handle, but reality is slightly different:
function f5(p)
total = 0.
det0 = det([p[1] p[end]])
d0 = p[end] - p[1]
vmax = -Inf
imax = 0
for i in eachindex(p)
if abs(det([p[i] d0]) + det0) > vmax
vmax = abs(det([p[i] d0]) + det0)
imax = i
end
end
total += vmax + p[imax][1] + p[imax][2]
return total
end
function f6(p)
total = 0.
det0 = det([p[1] p[end]])
d0 = p[end] - p[1]
vmax = -Inf
imax = 0
for i in eachindex(p)
v = abs(det([p[i] d0]) + det0)
if v > vmax
vmax = abs(det([p[i] d0]) + det0)
imax = i
end
end
total += vmax + p[imax][1] + p[imax][2]
return total
end
Why not? There is no side effects here, it should be easy to prove that these two copies of abs(det([p[i] d0]) + det0) have the same value. Modern C++ compilers are unlikely to fail to elide the second evaluation.
I took another look at measurements and with bounds checking disabled both Julia versions evaluate this expression only once:
Good point, but somehow the compiler didn’t take this possibility into account in my example. Is there an option to state that explicitly, e.g. from now on this container is read only?
Note that starting Julia with --math-mode=fast is playing with fire:
> julia -O3 --math-mode=ieee -E "sinpi(0.15)"
0.45399049973954675
> julia -O3 --math-mode=fast -E "sinpi(0.15)"
0.0
From the definition of f6:
for i in eachindex(p)
v = abs(det([p[i] d0]) + det0)
if v > vmax
vmax = abs(det([p[i] d0]) + det0)
imax = i
end
end
You didn’t actually reduce how often you’re calling det.
If you want to use LLVM.jl, you can. But it helps in situations like when you’re writing to one array and loading from another, it can tell the compiler that the write isn’t changing the load. That could allow it to reorder these reads and writes, and (for example) take advantage SIMD.
I would be very surprised if it helps in your example, however.
The array allocations are not inlined:
I can’t believe it got slower in some cases. Here is the new f6:
function f6(p)
total = 0.
det0 = det([p[1] p[end]])
d0 = p[end] - p[1]
vmax = -Inf
imax = 0
for i in eachindex(p)
v = abs(det([p[i] d0]) + det0)
if v > vmax
vmax = v
imax = i
end
end
total += vmax + p[imax][1] + p[imax][2]
return total
end