Minimizing allocations from memoized functions

I have the following MWE using a memoized function:

using Memoization
using FFTW

struct Operators{F}
    K::F
    # More operators
end 

function run(N, α)
    M = 512
    ω = rand(M)

    # Cache the kinetic factor
    function K(α)
        fun = if α == 0 
            @memoize function K_cubic(dx::Real)
                ifftshift(cis.(dx*ω.^2/2))
            end
            K_cubic
        elseif α > 0
            @memoize function K_hirota(dx::Real)
                ifftshift(cis.(dx*(ω.^2/2 - α*ω.^3)))
            end
            K_hirota
        end
        return fun
    end

    ops = Operators(K(α))
    
    ψ = Array{Complex{Float64}}(undef, M, N)
    dx = 1/N
    myloop(ψ, dx, ops, N)

end

function myloop(ψ, dx, ops, N)
    for i = 1:N-1
        @time @views T2A!(ψ[:, i+1], ψ[:, i], dx, ops)
        # or 
       # @time @views T4A!(ψ[:, i+1], ψ[:, i], dx, ops)
    end
end

function T2A!(ψₒ, ψᵢ, dx, ops) # 3 allocs total 
    # Do some non allocating operations using stuff in ops
    ψₒ .= ops.K(dx) .* ψᵢ # 3 allocs from memoized K(dx)
    # Do some more non allocating operations using stuff in ops
end

function T4A!(ψₒ, ψᵢ, dx, ops) # 3 allocs total 
    T2A!(ψₒ, ψᵢ, 0.123*dx, ops) # 3 allocs from memoized K(dx)
    T2A!(ψₒ, ψₒ, -0.456*dx, ops) # 3 allocs from memoized K(dx)
    T2A!(ψₒ, ψₒ, 0.123*dx, ops) # 3 allocs from memoized K(dx)
end

run(1000, 0.1)

After the first run of T2A!, every iteration of the loop has 3 allocations coming from the line ψₒ .= ops.K(dx) .* ψᵢ. Is it possible to get rid of them? The allocations are quite small (a few tens or hundreds of bytes).

Note that I can’t just cache K at the given dx and pass it as an array to T2A! due to how T4A! works (my production code has a function T6A! that calls T4A! with different values of dx, and functions that call T6A! etc). Without the “higher order” functions that change the dx, this approach worked fantastically, but now I’m stuck.

I do not know if that is related, buy why your K function returns a function instead of the value of the corresponding ifftshift? Could you memoise K directly?

1 Like

It’s to avoid writing something like this:

    function K(α, dx)
        if α == 0 
                ifftshift(cis.(dx*ω.^2/2))
        elseif α > 0
                ifftshift(cis.(dx*(ω.^2/2 - α*ω.^3)))
        end
    end

which would force me to unnecessarily carry \alpha around when it’s not needed. It’s also 18 allocations per memoized call instead of 3. Not quite sure why, probably has something to do with the branches inside the function.

I’m guessing the 3 allocations could be from the memoization machinery when memoizing a closure like is the case in your code, e.g.

julia> foo = (x -> (@memoize closure(y) = x+y))(1)
(::var"#closure#23"{Int64}) (generic function with 1 method)

julia> @time foo(2) # second time
  0.000013 seconds (3 allocations: 80 bytes)

Not sure if those can be gotten rid of but if you wanted to try I’d be happy to take a PR to Memoization.jl (I’m the author of that package). Note that memoizing top-level functions is a bit simpler, so there there’s no allocations, so a workaround could be to not use closures, assuming you really want to get rid of these allocations.

julia> @memoize bar(x) = x
bar (generic function with 1 method)

julia> @time bar(2) # second time
  0.000002 seconds
3 Likes

I have just learnt about this type of construct, and although it does not solve that specific allocation problem, it does solve one type instability you have on your test. I do not know if this will help you in general, but anyway (this is a function-like object):

struct Ks
  α :: Float64
  ω :: Vector{Float64}
end

@memoize function (K :: Ks)(dx)
  if K.α == 0
    ifftshift(cis.(dx*K.ω.^2/2))
  else
    ifftshift(cis.(dx*(K.ω.^2/2 - K.α*K.ω.^3)))
  end
end

function run2(N, α)
    M = 512
    ω = rand(512)
    # Cache the kinetic factor
    ops = Operators(Ks(α,ω))
    ψ = Array{Complex{Float64}}(undef, M, N)
    dx = 1/N
    myloop(ψ, dx, ops, N)
end

The rest of the code is identical to yours. I am not completely sure that this exact implementation gives you the same result, but it does solve the a type-instability you had:

Your code:

julia> @code_warntype run(5,0.1)
Variables
  #self#::Core.Compiler.Const(run, false)
  N::Int64
  α::Float64
  M::Int64
  ω::Array{Float64,1}
  K::var"#K#118"{Array{Float64,1}}
  ops::Operators{_A} where _A    <<<< RED
  ψ::Array{Complex{Float64},2}
  dx::Float64

Body::Nothing
1 ─       (M = 512)
│         (ω = Main.rand(M::Core.Compiler.Const(512, false)))
│   %3  = Main.:(var"#K#118")::Core.Compiler.Const(var"#K#118", false)
│   %4  = Core.typeof(ω)::Core.Compiler.Const(Array{Float64,1}, false)
│   %5  = Core.apply_type(%3, %4)::Core.Compiler.Const(var"#K#118"{Array{Float64,1}}, false)
│         (K = %new(%5, ω))
│   %7  = (K)(α)::Any   <<<<< RED
│         (ops = Main.Operators(%7))
│   %9  = Core.apply_type(Main.Complex, Main.Float64)::Core.Compiler.Const(Complex{Float64}, false)
│   %10 = Core.apply_type(Main.Array, %9)::Core.Compiler.Const(Array{Complex{Float64},N} where N, false)
│   %11 = M::Core.Compiler.Const(512, false)::Core.Compiler.Const(512, false)
│         (ψ = (%10)(Main.undef, %11, N))
│         (dx = 1 / N)
│   %14 = Main.myloop(ψ, dx, ops, N)::Core.Compiler.Const(nothing, false)
└──       return %14

With the function-like object you get:

julia> @code_warntype run2(5,0.1)
Variables
  #self#::Core.Compiler.Const(run2, false)
  N::Int64
  α::Float64
  M::Int64
  ω::Array{Float64,1}
  ops::Operators{Ks}
  ψ::Array{Complex{Float64},2}
  dx::Float64

Body::Nothing
1 ─       (M = 512)
│         (ω = Main.rand(512))
│   %3  = Main.Ks(α, ω)::Ks
│         (ops = Main.Operators(%3))
│   %5  = Core.apply_type(Main.Complex, Main.Float64)::Core.Compiler.Const(Complex{Float64}, false)
│   %6  = Core.apply_type(Main.Array, %5)::Core.Compiler.Const(Array{Complex{Float64},N} where N, false)
│   %7  = M::Core.Compiler.Const(512, false)::Core.Compiler.Const(512, false)
│         (ψ = (%6)(Main.undef, %7, N))
│         (dx = 1 / N)
│   %10 = Main.myloop(ψ, dx, ops, N)::Core.Compiler.Const(nothing, false)
└──       return %10

1 Like

Thank you! I was not aware of this construct and that type instability has been driving me crazy trying to fix it. I’ve tried a similar approach in my actual code and it works quite well. Is there some documentation I can refer to that explains this a little? I’m very new to Julia so I am a little confused by this construct.

Thank you for the explanation. The problem is that the loop can have a few billion iterations in long simulations, and when using higher order algorithms similar to T4A! I showed above, some of them call
T2A up to 40 times per loop iteration, so these allocations add up (and they just bug me in general).

I will take a look at the code and see if I can do something, but I’m brand new to Julia so I doubt I can contribute at the moment.

I have tried a simplified version of Leandro’s suggestion:

using Memoization
using FFTW

struct Ks
    α :: Float64
    ω :: Vector{Float64}
end
  
@memoize function (K::Ks)(dx)
    if K.α == 0
      ifftshift(cis.(dx*K.ω.^2/2))
    else
      ifftshift(cis.(dx*(K.ω.^2/2 - K.α*K.ω.^3)))
    end
end
  
function run(N, α)
      M = 512
      ω = rand(512)
      ψ = Array{Complex{Float64}}(undef, M, N)
      dx = 1/N
      myloop(ψ, dx, Ks(α,ω), N)
end

function myloop(ψ, dx, K, N)
    for i = 1:N-1
        @views T2A!(ψ[:, i+1], ψ[:, i], dx, K)
    end
end

function T2A!(ψₒ, ψᵢ, dx, K) # 3 allocs total 
    @time ψₒ .= K(dx) .* ψᵢ # 3 allocs from memoized K(dx)
end

run(4, 0.1)  

This still has 3 allocs per call of K(dx). Is K not considered a top level function in this implementation or is it because it returns a function “handle”? (not sure what they’re called in Julia)

P.S. nice coincidence seeing another Berkeley physicist on here!

1 Like

The docs are here: Methods · The Julia Language

But I have learnt this only a couple of days ago thanks to this post:

In this particular example of yours it seems to make the code more modular, and you can feed omega and alpha parameters to the object as well.

I think it is a closure anyway (actually the other way around, closures are function-like objects). You wont solve those allocations of memoise using this.

Correct, its not “top-level” because in Memoization.jl each instance of a Ks object gets its own independent memoizaiton cache, and its looking up the right cache for a given object that is causing the allocations (same as for a closure).

How about writing K_cubic and K_hirota as ordinary functions and K/Ks as a wrapper for them? For me this gets allocations from 3 to 1.

struct Ks
  α :: Float64
  ω :: Vector{Float64}
end

@memoize function K_cubic(dx::Real, ω)
    ifftshift(cis.(dx*ω.^2/2))
end
@memoize function K_hirota(dx::Real, ω, α)
    ifftshift(cis.(dx*(ω.^2/2 - α*ω.^3)))
end
function (K :: Ks)(dx)
  if K.α == 0
    K_cubic(dx, K.ω)
  else
    K_hirota(dx, K.ω, K.α)
  end
end