I have to run (possibly) millions of times the function isapprox(A, A_, rtol=1e-4), where A and A_ are two matrices. To improve efficiency, I defined a custom isapprox() function using a more efficient norm:
using LoopVectorization
function is_approx(A::Array{Float64,2}, A_::Array{Float64,2}, rtol::Float64)
# similar to built-in isapprox() but faster
return norm(A - A_) < rtol * max(norm(A), norm(A_))
end
function norm(A::Array{Float64,2})
# fast L(2,2) norm
# discussion: https://discourse.julialang.org/t/performance-of-norm-function/14709/13
x = zero(eltype(A))
@avx for i in eachindex(A)
@fastmath x += A[i] * A[i]
end
@fastmath sqrt(x)
end
If you want, you can find more context about what I use this for here .
Despite this being considerably more efficient than isapprox(), is_approx() is still the main performance bottleneck of my code. Do you have any ideas on how to make is_approx() more efficient?
If it is relevant, profiling the function,
using StatProfilerHTML
A, A_ = rand(10,10), rand(10,10)
@profilehtml for i in 1:10^6 is_approx(A,A_,1e-4) end
is_approx is calling norm 3 times (for A - A_ , A , and A_).
Fusing norm with is_approx can allow calculation of the norms together (keeping three accumulators for the relevant squared numbers), and will reduce memory access 3x leading to significant improvement to performance.
Thanks for helping. This seems to make it just slightly faster.
function is_approx2(A::Array{Float64,2}, A_::Array{Float64,2}, rtol::Float64)
# similar to built-in isapprox() but faster
return norm2(A - A_) < abs2(rtol) * max(norm2(A), norm2(A_))
end
function norm2(A::Array{Float64,2})
# fast L(2,2) norm
# discussion: https://discourse.julialang.org/t/performance-of-norm-function/14709/13
x = zero(eltype(A))
@avx for i in eachindex(A)
@fastmath x += A[i] * A[i]
end
return x
end
@btime is_approx($A,$A_,$1e-4)
96.901 ns (1 allocation: 896 bytes)
@btime is_approx2($A,$A_,$1e-4)
93.640 ns (1 allocation: 896 bytes)
function is_approx3(A::Array{Float64,2}, A_::Array{Float64,2}, rtol::Float64)
# make sure sizes of A and A_ are the same here (?)
s1, s2, s3 = 0.0, 0.0, 0.0
for i in eachindex(A)
a, a_ = A[i], A_[i]
s1 += (a - a_)^2
s2 += a^2
s3 += a_^2
end
return s1 < rtol^2*max(s2,s3)
end
LoopVectorization.@avx is the old name for LoopVectorization.@turbo. I thought it changed several years ago. If @avx is still around, I suggest you use the newer, official, name, @turbo.
I’m curious if you can share information about your application? It’s pretty rare for isapprox to be a performance-critical function. Even if you’re using it as a convergence test in an iterative solver of some kind, usually there are other parts of your iteration that will dominate.
Yes, use @turbo. I haven’t made a breaking release since then, which is why @avx still works. @turbo replaced @avx early in the 0.12 series. We’re now at 0.12.162.