# Faster isapprox()

I have to run (possibly) millions of times the function isapprox(A, A_, rtol=1e-4), where A and A_ are two matrices. To improve efficiency, I defined a custom isapprox() function using a more efficient norm:

``````using LoopVectorization

function is_approx(A::Array{Float64,2}, A_::Array{Float64,2}, rtol::Float64)
# similar to built-in isapprox() but faster
return norm(A - A_)  < rtol * max(norm(A), norm(A_))
end

function norm(A::Array{Float64,2})
# fast L(2,2) norm
# discussion: https://discourse.julialang.org/t/performance-of-norm-function/14709/13
x = zero(eltype(A))
@avx for i in eachindex(A)
@fastmath x += A[i] * A[i]
end
@fastmath sqrt(x)
end
``````

If you want, you can find more context about what I use this for here .

Despite this being considerably more efficient than isapprox(), is_approx() is still the main performance bottleneck of my code. Do you have any ideas on how to make is_approx() more efficient?

If it is relevant, profiling the function,

``````using StatProfilerHTML
A, A_ = rand(10,10), rand(10,10)
@profilehtml for i in 1:10^6 is_approx(A,A_,1e-4) end
``````

gives the following report:

`is_approx` is calling `norm` 3 times (for `A - A_` , `A` , and `A_`).
Fusing `norm` with `is_approx` can allow calculation of the norms together (keeping three accumulators for the relevant squared numbers), and will reduce memory access 3x leading to significant improvement to performance.

1 Like

What does fusing `norm` with `is_approx` mean? Would you be able to propose an example for your solution? Thanks for your help!

Given that you are comparing norms, can you skip all the sqrts and square your rtol instead?

4 Likes

Especially, you should skip calculating `A - A_` which is also allocating memory (slow). This calculation should not allocate any memory.

2 Likes

Thanks for helping. This seems to make it just slightly faster.

``````
function is_approx2(A::Array{Float64,2}, A_::Array{Float64,2}, rtol::Float64)
# similar to built-in isapprox() but faster
return norm2(A - A_)  < abs2(rtol) * max(norm2(A), norm2(A_))
end

function norm2(A::Array{Float64,2})
# fast L(2,2) norm
# discussion: https://discourse.julialang.org/t/performance-of-norm-function/14709/13
x = zero(eltype(A))
@avx for i in eachindex(A)
@fastmath x += A[i] * A[i]
end
return x
end

@btime is_approx(\$A,\$A_,\$1e-4)
96.901 ns (1 allocation: 896 bytes)

@btime is_approx2(\$A,\$A_,\$1e-4)
93.640 ns (1 allocation: 896 bytes)
``````
1 Like

The main bottleneck of your code is allocation when you create the new matrix `A - A_`.

Something efficient might look like:

``````function is_approx3(A::Array{Float64,2}, A_::Array{Float64,2}, rtol::Float64)
# make sure sizes of A and A_ are the same here (?)
s1, s2, s3 = 0.0, 0.0, 0.0
for i in eachindex(A)
a, a_ = A[i], A_[i]
s1 += (a - a_)^2
s2 += a^2
s3 += a_^2
end
return s1 < rtol^2*max(s2,s3)
end
``````
6 Likes

With LoopVectorization:

``````julia> function is_approx3(A::Matrix{T1}, B::Matrix{T2}, rtol) where {T1,T2}
normA = zero(T1)
normB = zero(T2)
normdiff = zero(promote_type(T1, T2))
@avx for i in eachindex(A, B)
a, b = A[i], B[i]
normA += abs2(a)
normA += abs2(b)
normdiff += abs2(a - b)
end
return normdiff < abs2(rtol) * max(normA, normB)
end
is_approx3 (generic function with 1 method)

julia> A, B = rand(10, 10), rand(10, 10);

julia> @btime is_approx(\$A, \$B, \$1e-4);
147.392 ns (1 allocation: 896 bytes)

julia> @btime is_approx2(\$A, \$B, \$1e-4);
144.150 ns (1 allocation: 896 bytes)

julia> @btime is_approx3(\$A, \$B, \$1e-4);
17.524 ns (0 allocations: 0 bytes)
``````

Note that `eachindex(A, B)` makes sure that the sizes are compatible

6 Likes

your solution, together with `@avx`, makes it x5 faster!

``````function is_approx3(A::Array{Float64,2}, A_::Array{Float64,2}, rtol::Float64)
s1, s2, s3 = 0.0, 0.0, 0.0
@avx for i in eachindex(A)
a, a_ = A[i], A_[i]
s1 += (a - a_)^2
s2 += a^2
s3 += a_^2
end
return s1 < rtol^2*max(s2,s3)
end

@btime is_approx3(\$A,\$A_,\$1e-4)
18.829 ns (0 allocations: 0 bytes)
``````
1 Like

`LoopVectorization.@avx` is the old name for `LoopVectorization.@turbo`. I thought it changed several years ago. If `@avx` is still around, I suggest you use the newer, official, name, `@turbo`.

4 Likes

I’m curious if you can share information about your application? It’s pretty rare for `isapprox` to be a performance-critical function. Even if you’re using it as a convergence test in an iterative solver of some kind, usually there are other parts of your iteration that will dominate.

4 Likes

Thanks for letting me know!

Sure! I wrote the code example in the link specifically for this question.

Yes, use `@turbo`. I haven’t made a breaking release since then, which is why `@avx` still works.
`@turbo` replaced `@avx` early in the 0.12 series. We’re now at 0.12.162.