Mutable ProtoStruct beats Base Mutable Struct in microbenchmark...and then it is much slower in real program

I developed a bit the ProtoStructs.jl package recently and I managed to improve the performance of the implementation, but there is something I don’t understand why is happening: microbenchmarks seems to lie with

julia> using ProtoStructs, BenchmarkTools

julia> @proto mutable struct A
           const x::Int
           y::Int
       end

julia> mutable struct B
           const x::Int
           y::Int
       end

julia> As = [A(x,x) for x in 1:1000];

julia> Bs = [B(x,x) for x in 1:1000];

julia> @benchmark sum(v.x for v in $As)
BenchmarkTools.Trial: 10000 samples with 401 evaluations.
 Range (min … max):  242.935 ns … 299.566 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     243.012 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   245.073 ns Β±   3.157 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–ˆ      β–†      β–ƒ       β–†      ▁                       β–„      β–‚ β–‚
  β–ˆβ–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–†β–†β–†β–†β–‡β–‡β–ˆβ–ˆβ–…β–†β–…β–†β–‡β–†β–ˆβ–ƒβ–ƒβ–…β–„β–„β–†β–ˆβ–…β–„β–†β–…β–‡β–…β–‡β–‡β–„β–…β–†β–…β–…β–…β–†β–ˆβ–„β–ƒβ–„β–β–…β–‡β–ˆβ–ˆβ–„β–„β–ƒβ–…β–„β–„β–ˆ β–ˆ
  243 ns        Histogram: log(frequency) by time        255 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark sum(v.y for v in $As)
BenchmarkTools.Trial: 10000 samples with 383 evaluations.
 Range (min … max):  247.055 ns … 365.457 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     255.089 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   257.748 ns Β±   5.791 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

       ▁ β–β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–„β–†β–ˆβ–‡β–…β–ƒβ–   ▁▃▃▅▅▅▄▅▅▅▅▄▂▁  ▁▂▁▁                  β–‚
  β–„β–…β–…β–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–‡β–‡β–†β–†β–‡β–‡β–ˆβ–‡β–‡β–‡β–‡β–‡β–† β–ˆ
  247 ns        Histogram: log(frequency) by time        274 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark sum(v.x for v in $Bs)
BenchmarkTools.Trial: 10000 samples with 212 evaluations.
 Range (min … max):  352.679 ns …  64.380 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     366.009 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   373.513 ns Β± 658.147 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

                             β–‚β–…β–ˆβ–†β–ƒβ–                              
  β–β–β–β–β–β–β–β–‚β–‚β–ƒβ–„β–‡β–‡β–ˆβ–ˆβ–†β–…β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–„β–ƒβ–„β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–„β–ƒβ–‚β–‚β–‚β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–‚β–‚β–‚β–‚β–‚β–‚ β–ƒ
  353 ns           Histogram: frequency by time          382 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark sum(v.y for v in $Bs)
BenchmarkTools.Trial: 10000 samples with 212 evaluations.
 Range (min … max):  350.792 ns … 524.656 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     366.061 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   364.981 ns Β±   6.284 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

                                β–β–…β–ˆβ–‡β–…β–ƒ                           
  β–‚β–β–‚β–β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–…β–†β–‡β–‡β–‡β–†β–…β–„β–ƒβ–ƒβ–ƒβ–ƒβ–„β–ƒβ–„β–„β–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–…β–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚ β–ƒ
  351 ns           Histogram: frequency by time          380 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Here you can see that the Mutable ProtoStruct seems faster than the Base one!

But then I tried in a real program to add @proto to the mutable struct, to my surprise all the program ended up 10x slower.

How can it be? Why this microbenchmark tells me the opposite?

Behind the scenes:

julia> @macroexpand @proto mutable struct A
                       const x::Int
                       y::Int
                    end
quote
    #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:110 =#
    if !($(Expr(:isdefined, :A)))
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:111 =#
        struct A{P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11, P12, P13, P14, P15, NT <: NamedTuple} <: Any
            #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:112 =#
            properties::NT
        end
    else
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:115 =#
        if Any != Any && Any != Base.supertype(A)
            #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:116 =#
            error("The supertype of a proto struct is not redefinable. Please restart your julia session.")
        end
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:118 =#
        the_methods = collect(methods(A))
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:119 =#
        Base.delete_method(the_methods[1])
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:120 =#
        Base.delete_method(the_methods[2])
    end
    #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:123 =#
    function $(Expr(:where, :(A(x::Int, y::Int))))
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:123 =#
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:124 =#
        v = NamedTuple{(:x, :y)}((x = x, y = Ref(y)))
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:125 =#
        return A{Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, typeof(v)}(v)
    end
    #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:128 =#
    function $(Expr(:where, :(A{}(x::Int, y::Int))))
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:128 =#
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:129 =#
        v = NamedTuple{(:x, :y)}((x = x, y = Ref(y)))
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:130 =#
        return A{Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, typeof(v)}(v)
    end
    #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:133 =#
    function A(; x, y)
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:133 =#
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:134 =#
        return A(x, y)
    end
    #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:137 =#
    function $(Expr(:where, :(A{}(; x, y))))
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:137 =#
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:138 =#
        A{}(x, y)
    end
    #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:141 =#
    function Base.getproperty(o::A, s::Symbol)
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:141 =#
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:142 =#
        p = getproperty(getfield(o, :properties), s)
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:143 =#
        if p isa Base.RefValue
            #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:144 =#
            p[]
        else
            #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:146 =#
            p
        end
    end
    #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:150 =#
    function Base.setproperty!(o::A, s::Symbol, v)
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:150 =#
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:151 =#
        p = getproperty(getfield(o, :properties), s)
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:152 =#
        if p isa Base.RefValue
            #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:153 =#
            p[] = v
        else
            #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:155 =#
            error("const field $(s) of type ", A, " cannot be changed")
        end
    end
    #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:159 =#
    function Base.propertynames(o::A)
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:159 =#
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:160 =#
        return propertynames(getfield(o, :properties))
    end
    #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:163 =#
    function Base.show(io::IO, o::A)
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:163 =#
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:164 =#
        vals = join([if x[] isa String
                    "\"$(x[])\""
                else
                    x[]
                end for x = getfield(o, :properties)], ", ")
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:165 =#
        params = (typeof(o)).parameters[1:(end - 15) - 1]
        #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:166 =#
        if isempty(params)
            #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:167 =#
            print(io, string(A), "($(vals))")
        else
            #= /home/bob/.julia/packages/ProtoStructs/uaUaY/src/ProtoStruct.jl:169 =#
            print(io, string(A, "{", join(params, ", "), "}"), "($(vals))")
        end
    end
end

You’re only benchmarking accessing a field of the instances and filling an array, there could be other things your program is doing that protostructs don’t handle well. As your macroexpand shows, a β€œmutable” protostruct is really an immutable struct with 15 parameters defaulting to Any and a 16th parameter for a concrete NamedTuple type; the mutability is implemented by the NameTuple holding a Ref field. My guess is you’re filling field or type parameters with ::A, which is abstract and makes inference poorer, or the compiler is otherwise having a harder time narrowing down the concrete subtype of A:

julia> typeof(B(1, 1)) === B, isconcretetype(B)
(true, true)

julia> typeof(A(1, 1)) === A, isconcretetype(A)
(false, false)

If you really need the redefinitions, then there’s no choice but to use the abstract A to accommodate all the possible redefined types. If you don’t, you run into type errors:

julia> @proto struct X val::Int32 end

julia> Xs = [X(Int32(1))]; # array stuck at specific X subtype

julia> @proto struct X val::Int64 end # X constructors switch to new subtype

julia> Xs[begin] = X(1)
ERROR: MethodError: Cannot `convert` an object of type 
  X{Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},NamedTuple{(:val,),Tuple{Int64}}} to an object of type 
  X{Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},Any{},NamedTuple{(:val,),Tuple{Int32}}}

As for why it’s faster for accessing fields, I’m guessing it’s the difference in memory layout. A is really an immutable struct, so its instances are stored inline in the array, with distantly allocated Ref{Int} for field y; B’s instances are allocated distantly from the array entirely, with a size of 2 Ints for both fields. I’m not very clear on how data is shuttled between CPU registers and the various levels of memory, but there is less work in retrieving A instances.

2 Likes

My guess is you’re filling field or type parameters with ::A, which is abstract and makes inference poorer

thanks, the performance hit seems unavoidable as you explained :frowning:

Still there is something I don’t get, why

julia> typeof(A(1,1))
A{Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, NamedTuple{(:x, :y), Tuple{Int64, Base.RefValue{Int64}}}}

is not enough to ensure good performance? All the Any parameters are actually not used inside the NamedTuple field.

I think that is enough, actually, it’s A itself that is the problem. AFAIK the Any parameters do nothing, they’re just placeholders in case there are any type parameters to hold. If we pretend they don’t exist, A is still parameterized by NT for the NamedTuple that specifies the redefinable fields. That’s what makes A abstract, and things will be inefficient if you use that as a field type or type parameter. typeof(A(1, 1)) on the other hand will be concrete and efficient. If you check eltype(As) you’ll see that it was automatically used as the array’s element type, not A. Try As = A[A(x,x) for x in 1:1000];, it’ll more reflect the inefficiencies of using ::A in programs.

2 Likes

mmmh yes you are right but I’m actually asking myself if the perf problem doesn’t arise from this very strange behaviour:

julia> using ProtoStructs

julia> @proto mutable struct A
                  const x::Int
                  y::Int
              end

julia> @time As = [A(x,x) for x in 1:1000];
  0.081959 seconds (164.95 k allocations: 10.881 MiB, 93.58% compilation time)

julia> @time As = [A(x,x) for x in 1:1000];
  0.016126 seconds (27.90 k allocations: 1.839 MiB, 99.01% compilation time)

julia> @time As = [A(x,x) for x in 1:1000];
  0.016207 seconds (27.90 k allocations: 1.839 MiB, 99.03% compilation time)

julia> @time As = [A(x,x) for x in 1:1000];
  0.016245 seconds (27.90 k allocations: 1.840 MiB, 99.02% compilation time)

julia> @time As = A[A(x,x) for x in 1:1000];
  0.000012 seconds (2.00 k allocations: 54.812 KiB)

Seems like As = [A(x,x) for x in 1:1000]; never gets compiled, while As = A[A(x,x) for x in 1:1000]; does

There you are benchmarking the element type inference. I can’t recall the underlying method, but it’s definitely struggling in the comprehension (happens for both A and B) in the global scope. Specifying the element type manually gets around it:

julia> @time As = [A(x,x) for x in 1:1000];
  0.420453 seconds (268.09 k allocations: 14.391 MiB, 46.16% gc time, 98.56% compilation time)

julia> @time As = [A(x,x) for x in 1:1000];
  0.023672 seconds (47.74 k allocations: 2.454 MiB, 98.81% compilation time)

julia> @time As = A[A(x,x) for x in 1:1000];
  0.000022 seconds (2.00 k allocations: 54.812 KiB)

julia> @time As = typeof(A(1,1))[A(x,x) for x in 1:1000];
  0.000016 seconds (1.00 k allocations: 31.375 KiB)

Analogous to:

julia> @time Xs = [Integer(x) for x in 1:1000];
  0.024034 seconds (40.29 k allocations: 2.108 MiB, 97.39% compilation time)

julia> @time Xs = Integer[Integer(x) for x in 1:1000];
  0.000013 seconds (490 allocations: 15.578 KiB)

julia> @time Xs = Int[Integer(x) for x in 1:1000];
  0.000007 seconds (1 allocation: 7.938 KiB)

1 Like

mmh, I think this is really the problem: I don’t have the control on most of the operations done with the proto struct, probably in them it is assumed inference on concrete types, so that the first version is used

edit: actually I pass A somewhere, so maybe if I change it with typeof(A(1,1)) the performance will improve

Actually though it seems that if you create the vector in a function it works fine:

julia> using ProtoStructs

julia> @proto mutable struct A
             const x::Int
             y::Int
          end

julia> f(n) = [A(x,x) for x in 1:n];

julia> @time f(1000);
  0.000010 seconds (1.00 k allocations: 31.375 KiB)

julia> @time f(10000);
  0.000069 seconds (10.00 k allocations: 312.547 KiB)

julia> @time f(10000);
  0.000069 seconds (10.00 k allocations: 312.547 KiB)

julia> f(n) = A[A(x,x) for x in 1:n];

julia> @time f(1000);
  0.000035 seconds (2.00 k allocations: 54.812 KiB)

julia> @time f(10000);
  0.000187 seconds (20.00 k allocations: 546.922 KiB)

julia> mutable struct B
             const x::Int
             y::Int
          end

julia> f(n) = [B(x,x) for x in 1:n];

julia> @time f(1000);
  0.000021 seconds (1.00 k allocations: 39.188 KiB)

julia> @time f(10000);
  0.000177 seconds (10.00 k allocations: 390.672 KiB)

Nonetheless I think the fact that A is abstract explains the difference in the real program

Just to let you know: passing an analogous of typeof(A(1, 1)) made my real program even slightly faster than the version with normal structs, thank you a lot @Benny