Mutable structs seem cause way more allocations in multithreaded code

Hi everyone,
I have encountered a pretty annoying performance bottleneck when using StatsBase.Weights, which is a mutable struct (whatever may be the reason for that :P) , in a multithreaded setting.

After going down the rabbit hole, I have seen that the issue is that allocations seem to explode whenever using a mutable struct in a multithreaded setting. For example:

mutable struct teststr
    i::Int
end

function test()
    for _ in 1:8000
        Threads.@threads for i in 1:100 # removing @threads will result in stable allocations
            a = @allocated teststr(i)
            if a > 32*200
                error("Allocation of $(Base.format_bytes(a))")
            end

        end
    end
end
test()

shows that in some cases, julia takes over 10 kilobytes for a single allocation .

Is this somehow expected behaviour or a bug? I can imagine that @allocated somehow also counts allocations from the multithreading, but the vscode profiler also shows a lot of time being spent in constructing teststr (or, rather StatsBase.Weights in my original use case).

Otherwise, what is the best way to resolve my concrete issue? Is there some immutable version of StatsBase.Weights? Should I pre-allocate it and update the struct with the appropriate weights each time? This will probably work but it feels rather hacky and overcomplicated to me, as it should be only a simple wrapper type…

Your imagination is at least partially correct. @allocated effectively asks the GC how many bytes have ever been allocated, then executes the expression and then asks again about the overall allocations. The difference is then printed.

This does what you think it does when only a single task is scheduled. As soon as you have multiple tasks, this can be completely misleading.

Note: I am not saying that this is not the problem. I am just hinting, that the used method is flawed in this context.

Your if will allocate. So as soon as one task got a 6400 allocation (because it was not scheduled for some time), the allocations will multiply.

2 Likes

Thanks! I did some more digging and discovered my performance regression was due to spawning quite a lot more threads than what I was intending.
So I think in this case the issue is actually only the threads overhead and there is no major issue with using StatsBase.Weights in multithreading in particular!

1 Like