I have written script that reads arrays from a .mat file and allocated new arrays based on the size of the arrays read from the .mat file.
The issue is that when the new arrays are allocated using the size of read arrays, operations that are applied to these arrays cause allocations. This is not the case if the size of the new arrays is determined independently of the read arrays.
I could repeat this behaviour in a MWE:
sing MAT
function GenerateDummyData()
a0 = zeros(400)
file = matopen(string(@__DIR__, "/DummyData.mat"), "w")
write(file, "a0", a0)
close(file)
end
function maintest(n)
#------------#
file = matopen( string(@__DIR__, "/DummyData.mat") )
a0 = Array(read(file, "a0"))
close(file)
#------------#
array_length = length(a0)
# array_length = 400
a1 = zeros(array_length)
a2 = zeros(array_length)
@time for i=1:n
for j in eachindex(a1)
a1[j] = a2[j]
end
end
end
GenerateDummyData()
maintest(1)
maintest(10)
maintest(100)
Here the size of the new arrays (array_length) is derived from the read arrays (array_length = length(a0)).
This causes allocations that scale with n, the number of sweeps through the new arrays:
You can use the @code_warntype macro to pin down type instabilities. When you run this in a REPL you get colored output showing you where Julia was unable to infer a concrete type in that call. Usually, you should aim to avoid all red lines in that output. Applying it to this example gave me the following output (unfortunately without colors here):
If you study the above (with colors) you will see that the first appearance of a type instability is from the line a0 = Array(read(file, "a0")). (I think) This is because read is type instable, because it cannot infer the type of its return value from its arguments alone.
Changing the line to a0 = read(file, "a0")::Vector{Float64} cures the problem in this case, because it promises to Julia that the returned value will be a Vector{Float64}, which is a concrete type. If read does not return a Vector{Float64}, then this line will throw an error, hence, it acts like an assertion.
However, to my own surprise, using a0 = Vector{Float64}(read(file, "a0")) does not help here, although a0 is then also of type Vector{Float64}. Perhaps somebody else can explain why that is the case…
Thanks a lot! Specifying type with ::Vector{Float64} does solve the issue, at least in the MWE. I will definitely use @code_warntype in the future, thanks for the hint!
What I find puzzling is that if you one checks typeof(a0) without additional type specification it already returns Vector{Float64}. So why is the additional type specification is needed? Also, what is weird is that the allocations appear when touching arrays that have the size of a0 (not directly touching a0). So it seems that just using the size of a “non-properly typed” array (instead of the array itself) is troublesome…
Type annotation only works if there is truly only ever one type coming out of the read operation. Imagine you could read Float64 or Float32 or Int vectors, then type annotation wouldn’t help you.
The trick in such situations is to just not let the type instability proliferate further than it needs to.
You have one function that does both a read and also does a computation with the read values. By splitting the computation out, you are making a “function barrier”. When that function is called, even though it has to be determined dynamically which method is needed (because the array type is unknown before runtime) the function will be compiled for that specific type the first time a new type appears.
As the function barrier is called only rarely (just once in your case) its dynamic lookup overhead doesn’t matter (it’s somewhere on the order of nano to microseconds).
So the rule of thumb is, split your functions up to avoid overly large impact of unavoidable type instabilities, and don’t mind type instabilities if the code they affect is short or only executed a couple of times.
function maintest(n)
#------------#
file = matopen( string(@__DIR__, "/DummyData.mat") )
a0 = Array(read(file, "a0"))
close(file)
# at this point in time, a0 has a concrete value, although it wasn't known to the compiler beforehand
# so now we call the rest of the logic as a separate function
rest(n, a0)
end
function rest(n, a0)
# for this function, n and a0 will always have concrete types
# so it doesn't suffer from the read type instability
array_length = length(a0)
# array_length = 400
a1 = zeros(array_length)
a2 = zeros(array_length)
@time for i=1:n
for j in eachindex(a1)
a1[j] = a2[j]
end
end
end