Interval arithmetic - computation time

Hi, I just started using the IntervalArithmetic package for Julia. For the same computations, Matlab (using IntLab) performs extremely well in comparison with Julia.

The implementation has ben adapted, that is the computations are vectorized in Matlab but not in Julia. Thus, as expected if I compare the code with no interval arithmetic, Julia is faster than Matlab.

Is there a specific and more efficient way of computing when using IntervalArithmetic ? Currently, my code without interval arithmetic and with interval arithmetic are the same except that I ensure that all the variables are Interval{Float64}.

Thank you for your help!

It’s really hard to say anything at all without some idea of what your code is.

Do you have an example of a (relatively short) function which exhibits slow performance with intervals in Julia?

4 Likes

Your reply then suggests that there is no function a priori costly that should be avoided. Good to know!

About the code, here is one:

using LinearAlgebra
using IntervalArithmetic
using BenchmarkTools
using Base.Threads

function fun1(x,n,l)
    sqrt2 = sqrt(2)
    F = Matrix{Float64}(I, n, n)
    @inbounds for j in 1:n
        @inbounds @threads for i in 1:n
            @inbounds for k in 0:l-1
                theta = k*2*pi/l
                costheta = cos(theta)
                cos2theta = cos(2*theta)
                xisquare, xjsquare, xixj = x[i]^2, x[j]^2, x[i]*x[j]
                if i!=j
                    d = (xisquare + xjsquare - 2*xixj*costheta)^(1/2)
                    F[i,i] += -(4*xisquare + xjsquare + xjsquare*cos2theta)/d^5
                    F[i,j] += 4*(xisquare + xjsquare)*costheta/d^5
                elseif i==j && k>0
                    F[i,j] += -1/x[i]^3/(1-costheta)^(1/2)
                end
            end
        end
    end
    return F
end


function fun2(ix,n,l)
    sqrt2 = @interval(sqrt(2)) # this should be safe
    iF = Matrix{Interval{Float64}}(I, n, n)
    @inbounds for j in 1:n
        @inbounds @threads for i in 1:n
            @inbounds for k in 0:l-1
                itheta = k*2*@interval(pi)/l
                icostheta = cos(itheta)
                icos2theta = cos(2*itheta)
                ixisquare, ixjsquare, ixixj = ix[i]^2, ix[j]^2, ix[i]*ix[j]
                if i!=j
                    id = (ixisquare + ixjsquare - 2*ixixj*icostheta)^(1/2)
                    iF[i,i] += -(4*ixisquare + ixjsquare + ixjsquare*icos2theta)/id^5
                    iF[i,j] += (ixisquare + ixjsquare)*icostheta/id^5
                elseif i==j && k>0
                    iF[i,j] += -1/ix[i]^3/(1-icostheta)^(1/2)
                end
            end
        end
    end
    return iF
end

n=20
l=20

x = float(reshape(collect(1:n),n,1))
ix = convert(Array{Interval{Float64},2}, x)

@btime fun1(x,n,l)

@btime fun2(ix,n,l)

The code on which I noticed the time difference is a more elaborated version of this one but the loops are the same (I just remove a lot of terms to make it easier to read).
Using IntervalArithmetic, the runtime is almost multiplied by a factor 200.

At the risk of putting words into the core devs mouths, I believe that is one of the core tenants of the design philosophy of Julia (this may or may not apply to IntervalArithmetic).

Hm, it looks like there is a performance issue with literal powers of intervals. For example:

julia> x = Interval(1.0, 1.0)
[1, 1]

julia> @btime $x * $x
  8.741 ns (0 allocations: 0 bytes)
[1, 1]

julia> @btime $x^2
  1.393 μs (51 allocations: 2.03 KiB)
[1, 1]

julia> @btime sqrt($x)
  56.531 ns (0 allocations: 0 bytes)
[1, 1]

julia> @btime $x^(1/2)
  1.940 μs (67 allocations: 2.78 KiB)
[1, 1]

It looks like the relevant issue, with some workarounds, is here: Power ^ allocates · Issue #103 · JuliaIntervals/IntervalArithmetic.jl · GitHub

Just by removing all the literal powers from your sample code, I was able to get a speedup of about 10x.

3 Likes

sqrt(x) is almost always going to be faster than x^0.5 anyway, because the latter dispatches to generic code whereas the former is optimized for square roots.

There are probably a few other small optimizations, e.g. -1/ix[i]^3/(1-icostheta)^(1/2) could be replaced by inv(ix[i]*ix[i]*ix[i] * sqrt((1-icostheta)), to only do the division once (and with specialized inv code rather than a generic division of 1).

1 Like

Hi,

As it has been pointed out, there are some optimizations related to sqrt, avoid some repeated calculations some constants that you can move out from the loops and divisions, and powers are an issue.

function fun3(ix,n,l)
    sqrt2 = @interval(sqrt(2)) # this should be safe
    piI = @interval(pi)
    iF = Matrix{Interval{Float64}}(I, n, n)
    @inbounds for j in 1:n
       @inbounds @threads for i in 1:n
           @inbounds for k in 0:l-1
               itheta = k*2*piI/l
               icostheta = cos(itheta)
               icos2theta = cos(2*itheta)
               ixisquare, ixjsquare, ixixj = ix[i]^2, ix[j]^2, ix[i]*ix[j]
               if i!=j
                   id = sqrt(ixisquare + ixjsquare - 2*ixixj*icostheta)
                   id5 = id^5
                   iF[i,i] += -(4*ixisquare + ixjsquare + ixjsquare*icos2theta)/id5
                   iF[i,j] += (ixisquare + ixjsquare)*icostheta/id5
               elseif i==j && k>0
                   iF[i,j] += -1/(ix[i]^3 * sqrt(1-icostheta))
               end
           end
       end
    end
    return iF
end

In my machine with Julia 0.7, I get the current benchmarks:

@btime fun1(x,n,l)
  5.363 ms (238781 allocations: 3.65 MiB)

@btime fun2(ix,n,l)
  179.564 ms (2656832 allocations: 96.55 MiB)

@btime fun3(ix,n,l)
  104.914 ms (1632388 allocations: 57.50 MiB)

fun3 didn’t fix all the powers (the biggest issue). I think you can replace ix[i]^2 by pow(ix[i], 2) or write out the multiplication.

And using pow(x, n) instead of x^n, you get better timings:

julia> function fun4(ix,n,l)
    sqrt2 = @interval(sqrt(2)) # this should be safe
    piI = @interval(pi)
    iF = Matrix{Interval{Float64}}(I, n, n)
    @inbounds for j in 1:n
       @inbounds @threads for i in 1:n
           @inbounds for k in 0:l-1
               itheta = k*2*piI/l
               icostheta = cos(itheta)
               icos2theta = cos(2*itheta)
               ixisquare, ixjsquare, ixixj = pow(ix[i],2), pow(ix[j],2), ix[i]*ix[j]
               if i!=j
                   id = sqrt(ixisquare + ixjsquare - 2*ixixj*icostheta)
                   id5 = pow(id,5)
                   iF[i,i] += -(4*ixisquare + ixjsquare + ixjsquare*icos2theta)/id5
                   iF[i,j] += (ixisquare + ixjsquare)*icostheta/id5
               elseif i==j && k>0
                   iF[i,j] += -1/(pow(ix[i],3) * sqrt(1-icostheta))
               end
           end
       end
    end
    return iF
end

julia> @btime fun4(ix,n,l)
  17.375 ms (231206 allocations: 7.06 MiB)

Thank you all for your suggestions!
I have been able to correct my code with great results!

Can you post your benchmarks to compare with IntLab?

1 Like

Also, if you see performance issues with @threads (for example, if removing it actually makes your code faster), then you’re probably running into performance of captured variables in closures · Issue #15276 · JuliaLang/julia · GitHub . There’s some discussion of the issue here: Parallelizing for loop in the computation of a gradient - #7 by tkoolen as well.

FWIW only the outermost @inbounds is needed, it applies to the whole block, including the inner loops.

2 Likes

I’m a bit confused about what your code is doing here. What is the purpose of

?
It’s not used. And why is this here

?

Why do you write two different functions for ordinary floats and intervals instead of a single one that handles both?

I would have expected that

all(fun1(x, n, l) .∈ fun2(ix, n, l))

to be true, but it’s not.

If I write a single function, like this, I get something more understandable:

mypow(x, n::Integer) = x^n
mypow(x::Interval, n::Integer) = pow(x, n)

function fun0((x::AbstractVector, l::Integer)
    F = diagm(0 => x)
    @inbounds for j in eachindex(x)  # dangerous to use 1:n where n is an input, since you have @inbounds
        xj2 = x[j] * x[j]
        Threads.@threads for i in eachindex(x)
            xi2 = x[i] * x[i]
            xixj = x[i] * x[j]
            for k in 0:l-1
                θ = 2π * k / l  # I did not make this an interval. Why should it be an interval?
                cosθ = cos(θ)
                cos2θ = cos(2θ)
                if i != j
                    d = 1 / mypow(sqrt(xi2 + xj2 - 2xixj * cosθ), 5)
                    F[i,i] += -(4xi2 + xj2 + xj2 * cos2θ) * d
                    F[i,j] += 4 * (xi2 + xj2) * cosθ * d
                elseif k > 0
                    F[i, j] -= 1 / (sqrt(1 - cosθ) * mypow(x[i], 3))
                end
            end
        end
    end
    return F
end

Then call it with:

julia> x = float.(1:n);  # make vector not matrix!

julia> xi = interval.(x);  # much simpler notation (also, use vector)

julia> all(fun0(x, 20) .∈ fun0(xi, 20))  # this is what I would have expected
true

julia> @btime fun0(x, 20);
  4.941 ms (68758 allocations: 1.03 MiB)

julia> @btime fun0(xi, 20);  # pretty fast
  6.383 ms (80461 allocations: 2.33 MiB)

Edit: Forgot to interpolate x and xi into @btime, but it makes no difference.
Edit2: For comparison with the old implementation for non-intervals:

julia> @btime fun1($x, 20, 20);
  6.335 ms (80782 allocations: 1.22 MiB)
2 Likes

BTW, since the matrices are symmetrical, you can make it faster by exploiting that.

Exploiting symmetry:

function funS(x::AbstractVector, l::Integer)
    F = diagm(0 => x)
    @inbounds for j in eachindex(x)
        xj2 = x[j] * x[j]
        Threads.@threads for i in 1:j
            xi2 = x[i] * x[i]
            xixj = x[i] * x[j]
            for k in 0:l-1
                θ = 2π * k / l
                cosθ = cos(θ)
                cos2θ = cos(2θ)
                if i != j
                    d = 1 / mypow(sqrt(xi2 + xj2 - 2xixj * cosθ), 5)
                    F[i,i] -= (4xi2 + xj2 + xj2 * cos2θ) * d
                    F[i,j] += 4 * (xi2 + xj2) * cosθ * d
                elseif k > 0
                    F[i, j] -= 1 / (sqrt(1 - cosθ) * mypow(x[i], 3))
                end
            end
        end
    end
    return Symmetric(F)
end
julia> all(funS(x, 20) .∈ funS(xi, 20))
true

julia> @btime funS($x, 20);
  2.498 ms (39458 allocations: 630.88 KiB)

julia> @btime funS($xi, 20);
  3.342 ms (46318 allocations: 1.34 MiB)
1 Like

Just a note on using intervals: you can substitute x^2 with x*x as long as x does not (strictly) contain 0. Otherwise, the dependency effect kicks in and the result will be an overapproximation.

3 Likes