Julia slowdown on long running programs with many allocations

Currently, I am benchmarking flux code compared to python code for different configurations of data and network sizes. However, I noticed that execution slows as more models have been trained with different configurations. For example, this code below is a mock of the experiment I am running:

for data in [small_data, medium_data, big_data]
    for batch_size in [32, 64, 128, 256]
        for num_layers in [1, 2, 3]
            for width in [32, 64, 128, 256, 512]
                model = make_network(width, num_layers)
                res = @benchmark optimize_model!($model, $data, $batch_size) samples=10 evals=1 seconds=250
            end
        end
    end
end

This execution time for the slowest configuration (big_data, batch=32, num_layers=3, width=512) has an average runtime of around 26 seconds and a minimum just above 24 seconds. However, if I restart the REPL and only this configuration the average time is 18.2 seconds and the maximum is just below 20 seconds. I checked for other configurations that were run later in the benchmarking and they all have a similar result. Is there a known slow down in Julia when the program runs for a long time or has a lot of allocations? Or is there a way to keep performance consistent through all variations?

I remember having a similar problem in a different script that was due to having many and frequent allocations. When the allocations were nearly eliminated the problem went away, which is why I suspect there is some causal factor to memory management in this setting.

Note the data sets have the sizes (8,1000), (8,5000), and (8,20000), but are stored as a vector of vectors. Mini-batches are created as matrices for efficient model use. This code is also run entirely on the CPU and the Julia version is 1.6.1.

Part of the problem here is that you may have major memory growth from all these allocations, resulting in a lot of garbage collection. I have a major performance log parsing program that behaves like that. With enough data, it oversubscribes memory, does lots of garbage collection, and eventually does demand paging which slows it down drastically. Of course it is building very large in memory data structures. If we were not moving away from tests requiring it to be used, I would refactor it again.

Monitor your memory usage.

I agree that the garbage collection time must be hurting you. You already are running benchmarks for all of them, can you take a look at what is the percentage of time spent in garbage collection

With the usual printing of it will appear as (XX% GC). So I figure you are coming close to the memory limit of your machine. The way to go would be to try and reduce the allocations by reusing arrays, making sure your functions are type stable, etc. Basically all the performance tips

Another thing to try if you’re working with lots of strings is ShortString.jl. Strings can currently create a lot of GC pressure which ShortStrings help with.

In your case, do you see the total memory usage of the julia instance grow with time as well? And with grow, I mean does it grow more than what would be expected from the potential growth in size of the datasets as if some memory usage was accumulated?

For some three years now I’ve had similar problems, which have turned out very hard to pin down. For me, the problems usually appear after having read and pre-processed a dataset several times over (e.g., running the same script multiple times and it reads data in the beginning). The data itself is not very large, say 100Mb. The memory usage of the julia instance keeps growing with time, and after a few hours I have to restart julia to reclaim the memory to not run out of RAM. I’ve tried so many times to create a small, self-contained example that demonstrates the issue but always failed.

I’m missing tools to figure out where all the memory is spent. There is sizeof but it’s severely limited in what it can report. I recall yet another utility function with similar limitations, Base.summarysize. I would like to figure out if there are, for instance, copies of the data stored in closures that are not being garbage collected etc.

Thanks for the responses. There are only a few places to reduce allocations further because most of them happen in the Flux model and gradient calls. I was not able to reliably replicate the exact behavior, but the issue was resolved by adding the line

GC.gc(true)

after each optimize_model! call.

I have once been tracking a memory allocation / leakage. I did it by calling GC.gc() several time and measuring the number of free memory by cat /proc/meminfo. On the end, the problem was that i called hcat with various number of parameters amd julia compiled many special versions.

But needless to say, i have a similar experience as baggepinnen.