How to speed up the Interpolation


#1

Hi all ~.
Currently, I want to make a function that returns numerical integration through interpolation. But my array is large(~10^7) which takes extreme amounts of allocations & memory.
Here’s my code.

unit_q = collect(0:0.0001:1)
#Function for numerical integration
function Kernel_Weight_Function(r::Float64)
    if r <1/2.
        return 8./(pi)*(1-6.*r^2.+6.*r^3.)
    elseif r<1.
        return 8./(pi)*2*(1-r)^3.
    else
        return 0
    end
end
#Numerical integration
function integr_ker(q_xy::Float64)
    dq = 0.0001
    F = 0
    qz = 0
    while qz < sqrt(1.-q_xy^2.)
        F += Kernel_Weight_Function(sqrt(qz^2.+q_xy^2.))*dq
        qz += dq
    end
    return 2*F
end

F_qxy = zeros(length(unit_q))
#Obtaining Value for interpolation
for i in 1:length(unit_q)
    F_qxy[i] = integr_ker(unit_q[i])
end
using Interpolations
knotsq = (unit_q,)
Numerical_Kernel = interpolate(knotsq, F_qxy, Gridded(Linear()))

#For time test
function f(Times::Int64)
  A = rand(0:2,Times)
  for AA in A
     F = Numerical_Kernel[A]
  end
end

@time f(1)
@time f(100)
@time f(1000)

The results are following and first one can be ignored.

  1.395746 seconds (236.13 k allocations: 12.455 MiB, 0.96% gc time)
  0.001321 seconds (205 allocations: 176.031 KiB)
  0.127418 seconds (2.00 k allocations: 15.511 MiB, 12.43% gc time)

So code takes ~100 times more time for calculating interpolation 10 times more.(My array ~ 10^7 so it will take 10^8*0.1 sec…)
Are there any ways to speed up this interpolation?
Thanks.


#2

First thing to do is always to check for type instability:

julia> using Traceur
julia> @trace Kernel_Weight_Function(1.)
(Kernel_Weight_Function)(::Float64) at REPL[3]:2
  returns Union{Float64, Int64}

julia> @trace integr_ker(1.)
(integr_ker)(::Float64) at REPL[6]:2
  qz is assigned as Int64 at line 4
  qz is assigned as Float64 at line 7
  F is assigned as Int64 at line 3
  F is assigned as Float64 at line 6
  dynamic dispatch to qz < (Base.Math.sqrt_llvm)((Base.sub_float)((Base.sitofp)(Float64, 1), #temp#)) at line 5
  dynamic dispatch to (Base.literal_pow)(Main.^, qz, Val{2}) at line 6
  dynamic dispatch to (Base.broadcast)(Main.+, #temp#, #temp#) at line 6
  dynamic dispatch to $(Expr(:invoke, MethodInstance for Kernel_Weight_Function(::Float64), :(Main.Kernel_Weight_Function), :((Base.Math.sqrt_llvm)(#temp#)))) * 0.0001 at line 6
  dynamic dispatch to F + #temp# at line 6
  dynamic dispatch to qz + 0.0001 at line 7
  dynamic dispatch to 2F at line 9
  returns Union{Float64, Int64}

(or just use @code_warntype)


#3

Hi @tkoolen and thanks for reply.
But I cannot understand the lines followed by @trace operator…
Do you mean that it took long time due to type mismatch?
Can you explain little bit more on the lines?


#4

Have a look at https://docs.julialang.org/en/stable/manual/performance-tips/#Write-“type-stable”-functions-1. Applied to your case, Kernel_Weight_Function for example returns either 0 (an Int) or a Float64 value for the other two branches. Similarly, in integr_ker, F and qz are first set to 0 (again, an Int), but later Float64s are added to F and qz (conditionally). So the type of these variables can’t be determined at compile time, which results in much slower code.

After you fix these issues, you’ll also want to look at https://docs.julialang.org/en/stable/manual/performance-tips/#Avoid-global-variables-1: knotsq, F_qxy, and Numerical_Kernel are non-const global variables, and so again their type cannot be determined at compile time.

Edit: while I’m at it:

  • consider using BenchmarkTools to avoid some of the pitfalls of benchmarking Julia code and to get more accurate results.
  • use r^3 instead of r^3. (and similar cases). There’s about an 18x speed difference on my machine (determined using BenchmarkTools’s @btime) for ^3 (smaller difference for ^2).

#5

Type stability is a fundamental concept when it comes to Julia performance and memory use. Here’s a minimal example of a type unstable function

function foo(x)
   if x > 0
      return 1
   else
      return 0.0
   end
end

The type of the output depends on the input value not just the input type. In this particular example the compiler is still able to compile fast code, though, probably because it’s so simple.

I will also strongly advise you not to use the notation 3. when you mean 3.0. Lots of people do this for some reason, but, for one thing, it makes your code harder to read, and more importantly, it may behave differently from what you expect:

julia> 3.*4
12

julia> 3.0*4
12.0

julia> 3. *4
12.0

The notation .* means “broadcast the multiplication operator over collections” (well, loosely). You use it to multiply arrays, for example: rand(5) .* rand(5), but will also accept scalars. Take a look here: https://docs.julialang.org/en/stable/manual/functions/#man-vectorized-1

It is not possible to tell which meaning you intend, and I believe that this notation will even become an error in the next version.

My advice, and personal opinion: always write 3.0 and 0.3, not 3. and .3. The latter is hard to read, easy to miss or misunderstand, it’s ambiguous, will probably be an error (and, IMHO, it’s ugly :wink: )

You can also resolve ambiguities by putting spaces between operators: 3 .* x, which is also recommended.


#6

Oh, and as @tkoolen says, in this case it is even better to not multiply with floats at all. Just multiply with ints:

8 / pi * (1 - 6 * r^2 + 6 * r^3)

Then type promotion (https://docs.julialang.org/en/stable/manual/conversion-and-promotion/#conversion-and-promotion-1) will work for you. If you want to use a different type for r, that type is preserved. For example, you may want to use Float32 if you’re on a gpu:

julia> x = rand()
0.9760970240572919

julia> y = Float32(x)
0.97609705f0

julia> typeof(3 * x)
Float64

julia> typeof(3 * y)
Float32

julia> typeof(3.0 * y)
Float64

For a fancy, maths-like appearance, you can do this:

8/π * (1 - 6r^2 + 6r^3)

#7

Didn’t you mean to do:

function f(Times::Int64)
  A = rand(0:2,Times)
  for AA in A
     F = Numerical_Kernel[AA] # <- Note AA here, not A
  end
end

Otherwise, it is not strange you get O(N^2) behavior since you are looping over A, O(N), and in each iteration computing the interpolation in N points which is also O(N) -> O(N^2).

Here is the code slightly rewritten: https://gist.github.com/KristofferC/24db31b00b071e6e154983646055ab9a

that gives you:

julia> @time f(100)
  0.000018 seconds (5 allocations: 1.031 KiB)

julia> @time f(1000)
  0.000115 seconds (5 allocations: 8.094 KiB)

julia> @time f(10000)
  0.000967 seconds (6 allocations: 78.359 KiB)

julia> @time f(10^7)
  0.814747 seconds (7 allocations: 76.294 MiB, 10.54% gc time)

#8

Thanks for many useful suggestions.
What @kristoffer.carlsson is right, I made a typo at the middle point and it made the code O(N^2).
Actually, this was part of the code so I think I should check whether this change make my code fast :smile:.