Currently, I am benchmarking flux code compared to python code for different configurations of data and network sizes. However, I noticed that execution slows as more models have been trained with different configurations. For example, this code below is a mock of the experiment I am running:
for data in [small_data, medium_data, big_data]
for batch_size in [32, 64, 128, 256]
for num_layers in [1, 2, 3]
for width in [32, 64, 128, 256, 512]
model = make_network(width, num_layers)
res = @benchmark optimize_model!($model, $data, $batch_size) samples=10 evals=1 seconds=250
end
end
end
end
This execution time for the slowest configuration (big_data, batch=32, num_layers=3, width=512) has an average runtime of around 26 seconds and a minimum just above 24 seconds. However, if I restart the REPL and only this configuration the average time is 18.2 seconds and the maximum is just below 20 seconds. I checked for other configurations that were run later in the benchmarking and they all have a similar result. Is there a known slow down in Julia when the program runs for a long time or has a lot of allocations? Or is there a way to keep performance consistent through all variations?
I remember having a similar problem in a different script that was due to having many and frequent allocations. When the allocations were nearly eliminated the problem went away, which is why I suspect there is some causal factor to memory management in this setting.
Note the data sets have the sizes (8,1000), (8,5000), and (8,20000), but are stored as a vector of vectors. Mini-batches are created as matrices for efficient model use. This code is also run entirely on the CPU and the Julia version is 1.6.1.