Large memory footprint using type dispatch within a loop

phrmoy · March 25, 2018, 10:20pm

Hi there,
I am new to Julia and very excited about it. One of the things I have been excited about (other than performance) is the multiple dispatch. So I figured I would eliminate many of jammed zillion of if-else statements I have in my code for smaller functions using multiple dispatch. So I did a little rough performance test to see what kind of overhead I would incur, and not only it took quite a bit longer, but it generated a large memory footprint:

 0.074132 seconds (5 allocations: 176 bytes) #ifelse statement within loop
 5.737640 seconds (200.00 M allocations: 2.980 GiB, 1.54% gc time) #type dispatch 1 within loop
 6.612299 seconds (200.00 M allocations: 2.980 GiB, 1.39% gc time) #type dispatch 2 within loop

There is a good chance but I am probably doing something I am not supposed to, would certainly appreciate guidance. Thank you.

Here’s the code:

function test_if(N::Int,p::Int)
    j = 0
    for i=1:N
        if p == 0
            j += 1
            p = 1
        elseif p == 1
            j += 2
            p = 0
        end
    end
    return j
end
test_if(10,0) #15

abstract type atype end
struct s1 <: atype end
struct s2 <: atype end
abstract type a1 <: atype end
abstract type a2 <: atype end

@inline function sum_j(j::Int,::Type{s1})::Int
    return j+1
end
@inline function sum_j(j::Int,::Type{s2})::Int
    return j+2
end
@inline function sum_j(j::Int,::Type{a1})::Int
    return j+1
end
@inline function sum_j(j::Int,::Type{a2})::Int
    return j+2
end
@inline function update_s(::Type{s1})::Type{s2}
    return s2
end
@inline function update_s(::Type{s2})::Type{s1}
    return s1
end
@inline function update_s(::Type{a1})::Type{a2}
    return a2
end
@inline function update_s(::Type{a2})::Type{a1}
    return a1
end

function test_struct(N::Int,p::Type{T}) where {T<:atype}
    j = 0
    for i=1:N
        j = sum_j(j,p)
        p = update_s(p)
    end
    return j
end
test_struct(10,a1) #15

N = 100000000
@time test_if(N,0)
@time test_struct(N,s1)
@time test_struct(N,a1)

which generates

 0.074132 seconds (5 allocations: 176 bytes)
 5.737640 seconds (200.00 M allocations: 2.980 GiB, 1.54% gc time)
 6.612299 seconds (200.00 M allocations: 2.980 GiB, 1.39% gc time)

EDIT: Running Julia 0.6.0 on Windows.

platawiec · March 25, 2018, 11:08pm

You may have seen the performance tips. Running @code_warntype on your test_struct function, we see the following (I’ve truncated the output):

julia> @code_warntype test_struct(N, a1)
Variables:
  #self# <optimized out>
  N::Int64
  p@_3::Type{a1}
  i <optimized out>
  #temp#@_5::Int64
  j::Int64
  p@_7::Any
  #temp#@_8::Core.MethodInstance
  #temp#@_9::Int64
  #temp#@_10::Core.MethodInstance
  #temp#@_11::Any

The key here is the variables p@_7 and #temp#@_11, which the compiler infers as Any. This is because your dispatch function update_s is not type-stable, in other words, the output type of of update_s changes depending on the input. The compiler then has trouble optimizing this code.

phrmoy · March 25, 2018, 11:13pm

Thanks for the reminder. Here’s what I get:

Variables:
  #self#::#test_struct
  N::Int64
  p::Any
  i::Int64
  #temp#::Int64
  j::Int64
  sc::switch_context

Body:
  begin 
      j::Int64 = 0 # line 3:
      sc::switch_context = $(Expr(:new, :(Main.switch_context), s1)) # line 4:
      SSAValue(2) = (Base.select_value)((Base.sle_int)(1, N::Int64)::Bool, N::Int64, (Base.sub_int)(1, 1)::Int64)::Int64
      #temp#::Int64 = 1
      7: 
      unless (Base.not_int)((#temp#::Int64 === (Base.add_int)(SSAValue(2), 1)::Int64)::Bool)::Bool goto 18
      SSAValue(3) = #temp#::Int64
      SSAValue(4) = (Base.add_int)(#temp#::Int64, 1)::Int64
      #temp#::Int64 = SSAValue(4) # line 5:
      j::Int64 = (Main.sum_j)(j::Int64, (Core.getfield)(sc::switch_context, :t)::DataType)::Int64 # line 6:
      (Main.update_s)(sc::switch_context, (Core.getfield)(sc::switch_context, :t)::DataType)::Void
      16: 
      goto 7
      18:  # line 8:
      return j::Int64
  end::Int64

So p cannot be inferred. But this couldn’t possibly be the reason why the program is generating almost 3 GB of memory footprint compared to the if-else program that does not even generate 1 MB (or could it?).

EDIT: The above is a slightly different program, but the idea is the same.

EDIT 2: Here is the modified code with similar performance

abstract type atype end
struct s1 <: atype end
struct s2 <: atype end
mutable struct switch_context
    t::DataType
end
@inline function sum_j(j::Int,::Type{s1})::Int
    return j+1
end
@inline function sum_j(j::Int,::Type{s2})::Int
    return j+2
end
@inline function update_s(sc::switch_context,::Type{s1})
    sc.t = s2
    return nothing
end
@inline function update_s(sc::switch_context,::Type{s2})
    sc.t = s1
    return nothing
end
function test_struct(N::Int,p::Type{T}) where {T<:atype}
    j = 0
    sc = switch_context(p)
    for i=1:N
        j = sum_j(j,sc.t)
        update_s(sc,sc.t)
    end
    return j
end
test_struct(10,s1)

tested as

N = 100000000
@time test_if(N,0)
@time test_struct(N,s1)

generating

  0.062493 seconds (5 allocations: 176 bytes)
 10.500449 seconds (200.00 M allocations: 2.980 GiB, 0.84% gc time)

platawiec · March 25, 2018, 11:17pm

I believe it can, see: Overhead of dynamic dispatch? - #5 by mbauman

mbauman · March 25, 2018, 11:29pm

Yup. The 3GB that’s reported isn’t an instantaneous “footprint” at any given time. It’s the total number of allocations — including temporaries. Type-unstable code generates lots of temporaries because Julia can no longer make any assumptions about the unstable variables. It doesn’t even know how much space they’ll take up. So it must resort to allocating each one independently, processing it, and discarding it.

Julia 0.7 is much better at dealing with instabilities like this where inference can figure out that the result type isn’t completely unknown — it’s just one of a handful of known types. Then it can generate explicit branches that handle each case in an efficient manner:

# The code from the first post on 0.7
julia> @time test_if(N,0)
  0.068058 seconds (5 allocations: 176 bytes)
150000000

julia> @time test_struct(N,s1)
  3.237521 seconds (8.84 k allocations: 523.067 KiB)
150000000

julia> @time test_struct(N,a1)
  3.472727 seconds (5 allocations: 176 bytes)
150000000

phrmoy · March 25, 2018, 11:33pm

Thanks for the input, so is this code also type unstable?

abstract type atype end
struct s1 <: atype end
struct s2 <: atype end
mutable struct switch_context
    t::DataType
end
@inline function sum_j(j::Int,::Type{s1})::Int
    return j+1
end
@inline function sum_j(j::Int,::Type{s2})::Int
    return j+2
end
@inline function update_s(sc::switch_context,::Type{s1})
    sc.t = s2
    return nothing
end
@inline function update_s(sc::switch_context,::Type{s2})
    sc.t = s1
    return nothing
end
function test_struct(N::Int,p::s1)
    j = 0
    sc = switch_context(typeof(p))
    for i=1:N
        j = sum_j(j,sc.t)
        update_s(sc,sc.t)
    end
    return j
end
test_struct(10,s1)

where @code_warntype test_struct(N,s1()) gives

Variables:
  #self#::#test_struct
  N::Int64
  p::s1
  i::Int64
  #temp#::Int64
  j::Int64
  sc::switch_context

Body:
  begin 
      j::Int64 = 0 # line 23:
      sc::switch_context = $(Expr(:new, :(Main.switch_context), s1)) # line 24:
      SSAValue(2) = (Base.select_value)((Base.sle_int)(1, N::Int64)::Bool, N::Int64, (Base.sub_int)(1, 1)::Int64)::Int64
      #temp#::Int64 = 1
      7: 
      unless (Base.not_int)((#temp#::Int64 === (Base.add_int)(SSAValue(2), 1)::Int64)::Bool)::Bool goto 18
      SSAValue(3) = #temp#::Int64
      SSAValue(4) = (Base.add_int)(#temp#::Int64, 1)::Int64
      #temp#::Int64 = SSAValue(4) # line 25:
      j::Int64 = (Main.sum_j)(j::Int64, (Core.getfield)(sc::switch_context, :t)::DataType)::Int64 # line 26:
      (Main.update_s)(sc::switch_context, (Core.getfield)(sc::switch_context, :t)::DataType)::Void
      16: 
      goto 7
      18:  # line 28:
      return j::Int64
  end::Int64

There are no warnings above.

mbauman · March 25, 2018, 11:39pm

Yes, this is a special case of Avoid fields with an abstract type:

mutable struct switch_context
    t::DataType
end

Now, DataType isn’t technically abstract (hence no warnings), but switch_context doesn’t know precisely which DataType it’s holding onto… and this is crucially important since you later dispatch on the exact types. All switch_context knows is that it’s holding onto a type of some sort. To give it that information, you use a type parameter:

mutable struct switch_context{T}
    t::Type{T}
end

Do that, and you should see red flags.

phrmoy · March 25, 2018, 11:50pm

Thanks for the input!

Topic		Replies	Views
Dangers of abusing multiple dispatch: missing optimization? General Usage	1	608	April 14, 2017
Performance benefits of multiple dispatch Performance question , multidispatch	20	2463	December 19, 2021
Trying to understand optimization via multiple dispatch and type stability New to Julia type-stability , multidispatch	5	993	December 8, 2019
Any performance issues with multiple dispatch? New to Julia	5	1347	February 17, 2021
Efficiency of Multiple Dispatch Performance question	3	201	November 10, 2024

Large memory footprint using type dispatch within a loop

Related topics