Changing array inside mutable struct passed to function, how to improve performance?

I am creating a simulator, and to avoid passing on several different arrays (with different datatypes) to other functions that will apend values to them, i was thinking about grouping the arrays inside a mutable struct, so i can pass the struct and work on the arrays inside the function.

So i went to test the performance of changing an array in different ways and inside or outside mutable structs and i came up with a bunch of doubts.

mutable struct Test_struct
    b::Array{Int64}
end

function test0()
    b = Array{Int64}(undef, 4, 1000000)
    x = Vector{Int64}(undef, 4)
    for i = 1:1000000
        x = [i; 2*i; i; 2*i]
        b[:,i] = x
    end
end

function test1()
    b = Array{Int64}(undef, 4, 1000000)
    x = Vector{Int64}(undef, 4)
    for i = 1:1000000
        x = [i; 2*i; i; 2*i]
        b[1,i], b[2,i], b[3,i], b[4,i] = x
    end
end

function test2()
    b = Array{Int64}(undef, 4, 1000000)
    a = Test_struct(b)
    x = Vector{Int64}(undef, 4)
    for i = 1:1000000
        x = [i; 2*i; i; 2*i]
        a.b[1,i], a.b[2,i], a.b[3,i], a.b[4,i] = x
    end
end

function test3()
    b = Array{Int64}(undef, 4, 1000000)
    a = Test_struct(b)
    x = Vector{Int64}(undef, 4)
    for i = 1:1000000
        x = [i; 2*i; i; 2*i]
        a.b[:,i] = x
    end
end

function aux!(a, x, i)
    a.b[1,i], a.b[2,i], a.b[3,i], a.b[4,i] = x
end

function test4()
    b = Array{Int64}(undef, 4, 1000000)
    a = Test_struct(b)
    x = Vector{Int64}(undef, 4)
    for i = 1:1000000
        x = [i; 2*i; i; 2*i]
        aux!(a, x, i)
    end
end

println("test0")
@btime test0()
println("test1")
@btime test1()
println("test2")
@btime test2()
println("test3")
@btime test3()
println("test4")
@btime test4()

giving me the following performance information:

test0
  34.999 ms (1000002 allocations: 122.07 MiB)
test1
  27.547 ms (1000002 allocations: 122.07 MiB)
test2
  27.495 ms (1000002 allocations: 122.07 MiB)
test3
  269.969 ms (3998980 allocations: 167.83 MiB)
test4
  163.860 ms (8996427 allocations: 244.09 MiB)

Which led me to the questions:

  1. Why assigning values individually to an array (test1) is faster than assigning the whole column at once (test0)? (a small differente though)

  2. Why when the array is inside a mutable struct, the individual assignment is 10x faster than assigning the whole column (tests2 and 3)?

  3. why when i pass this struct for the assignment to happen inside another function it is much slower (test4), which doesn’t happen if i’m working with an array that is not inside a mutable struct.

  4. and most importantly, from these tests it appears placing the arrays inside the structs is not the way to go. How can i group the arrays is a way to pass a single object to the function and that this problem doesn’t happen? I don’t want to just put everything inside a vector because it will be hard to deal with the different arrays without their names.

(I’m kinda new to julia, but i put this under performance because it seems to be more about performance than the very basic of the programming knowledge)
Edit: Included mutable struct definition

this is allocating,

and this

here’s a quick fix

julia> function test0_fix()
           b = Array{Int64}(undef, 4, 1000000)
           x = Vector{Int64}(undef, 4)
           for i = 1:1000000
               x .= (i; 2*i; i; 2*i) # <--- small tuple doesn't heap allocate
               # ^------ this dot (broadcast) is important too
               b[:,i] = x
           end
       end
test0_fix (generic function with 1 method)

julia> @btime test0_fix()
  6.940 ms (3 allocations: 30.52 MiB)
1 Like

There are several things tripping you up. The first thing is that in every iteration you create a small temporary vector, [i; 2*i; i; 2*i]. Doing this is quite costly and is something to avoid. One way is to write a double loop instead, or perhaps you use tuple, x = (i; 2*i; i; 2*i) combined with the unrolled assignment.

Secondly, you should share your struct definition. Is it properly typed with concrete field types?

Also, it is better to use an immutable struct instead of a mutable type. You can mutate the contained arrays, even if the parent type is immutable.

4 Likes
  1. Not sure if the difference is significant, but if it is then it may be Julia optimizing to remove a loop (i.e., instead of looping to attribute the 4 values, it just attribute each one of them in a different instruction). Look in the internet for “loop unrolling”, it is the name of this technique.
  2. I think the extra level of indirection has blocked some compiler optimization, because you are allocating far more memory. I would believe that maybe the a.b[:,i] = x in test3 is dynamic dispatch? Do @code_typed or @code_native help?
  3. Probably the same as above, extra layer of indirection has confused the compiler into not optimizing something that should be possible. Try using the macros above to check. The really strange here is that you are allocating more, but taking less time.
  4. I think that putting a bunch of Array inside a struct should not be a problem in general, but heed that: if you can then use struct instead of mutable struct, you will still be able to resize the arrays or change the values inside it, you just cannot replace the Array object inside the struct with a whole new Array but this is not something that you want to do anyway.

NOTE: what @jling and @DNF point out is the most important thing, but I assumed your example to not be representative of real code and that you want to understand the difference between those specific artificial cases, hence my answer.

1 Like

The x seems like a redundant intermediate step. Can’t you write directly to b?

Thanks, this tuple broadcast helped me understand a lot a things I didn’t know could a problem in my code. Still trying to understand the problem with passing the struct into a function and this creating a bunch of new allocations for the array.

I didn’t know you could modify the mutable inside the unmutable kind, that also helps a lot with my performance problems, thank you

Thank you for the clear explanations, it helped me understand better some of the problems. The marcos are definitely the solution to understand the final problem of passing the struct to another function (test4), that is happening even with the unmutable struct. However i could still not understand how to fix it since I’m not a very proficient programmer. I’ll try to look further into it if you don’t have any other insights.

My problem now is with the following:

struct Test_struct
    b::Array{Int64}
end

function aux!(a, x, i)
    a.b[1,i], a.b[2,i], a.b[3,i], a.b[4,i] = x
end

function test4()
    b = Array{Int64}(undef, 4, 1000000)
    a = Test_struct(b)
    x = Vector{Int64}(undef, 4)
    for i = 1:1000000
        x .= (i, 2*i, i, 2*i)
        aux!(a, x, i)
    end
end

@btime test4()

Which is much slower than any of the others after the suggested corrections, and it happens to be what i want to do in my code.

All the comments helped me a lot with things I didn’t understand about improving julia performance. My final problem is why the test4 (passing the struct to a function that will perform the operations) is still much slower than all the rest even using an unmutable struct and the broadcast suggestions.

using BenchmarkTools

struct Test_struct
    b::Array{Int64}
end

function aux!(a, x, i)
    a.b[1,i], a.b[2,i], a.b[3,i], a.b[4,i] = x
end

function test4()
    b = Array{Int64}(undef, 4, 1000000)
    a = Test_struct(b)
    x = Vector{Int64}(undef, 4)
    for i = 1:1000000
        x .= (i, 2*i, i, 2*i)
        aux!(a, x, i)
    end
end

function aux2!(b, x, i)
    b[1,i], b[2,i], b[3,i], b[4,i] = x
end

function test5()
    b = Array{Int64}(undef, 4, 1000000)
    a = Test_struct(b)
    x = Vector{Int64}(undef, 4)
    for i = 1:1000000
        x .= (i, 2*i, i, 2*i)
        aux2!(a.b, x, i)
    end
end
test4
  200.148 ms (7996427 allocations: 152.53 MiB)
test5
  7.970 ms (3 allocations: 30.52 MiB)

If someone could help me with this final issue it would be awsome, since this is exactly what I want to do in my code and it’s still the only slow alternative.

Meanwhile I’m looking into @code_typed or @code_native suggestion by @Henrique_Becker to see if I can find it myself.

In this example it is, I’m just trying to replicate what I intend to do on the simulation, which will require an intermediate step

I wonder does it make a difference if `Array{Int64,2}’ is used here…?

struct Test_struct
    # b::Array{Int64}
    b::Array{Int64,2}     # or b::Matrix{Int64}
end

or

struct Test_struct{T}
    b::T
end

But on my old MacMini (2012), test4() is still slower than test5() (by GC time?)
0.027447 seconds (3 allocations: 30.518 MiB, 58.96% gc time)
0.011408 seconds (3 allocations: 30.518 MiB)

2 Likes

Thank you!!!
You cracked the final piece of the puzzle apparently, with this modification they are running at the same speed now on my pc.
Still don’t understand why by passing the array from the struct it was working and by passing the struct it wasn’t with the wrong assignment b::Array{Int64}. But this seems to be enough for now.

1 Like

Array{Int} is an abstract type, while Array{Int, 2} is a concrete type. This is what I meant when I asked if all your field types were concretely typed.

3 Likes

That is an incredibly common problem for beginners. People write the abstract Array{Int} when intended the concrete Vector{Int} a lot of times, many colleagues in my lab tripped on that one.

2 Likes

Yes, I included the struct definition because of your comment. I couldn’t see Array{Int} was not a concrete type. Thank you!

Note that this is the general way to guarantee that the field has a concrete type:

This leaves the difficult reasoning about type parameters to the compiler, so you don’t have to be on top of all these details yourself.

1 Like