I have a bit of a puzzle here. I simply loop through arrays and assign either at multiple of one array (with permutation of the rows and columns), or just assign a multiple of one array to the other. Both versions allocate like crazy.
N = 1000;
multiplier = 2.0 + 1.0im;
temp = rand(Complex{Float64}, N, N);
H = rand(Complex{Float64}, N, N);
panel2intdof = vec(collect(1:N));
@time begin
for co = 1:size(temp, 2)
renumco = panel2intdof[co]
for ro = 1:size(temp, 1)
renumro = panel2intdof[ro]
H[ro, co] = multiplier * temp[renumro, renumco]
end
end
end
@time begin
for co = 1:size(temp, 2)
for ro = 1:size(temp, 1)
H[ro, co] = (multiplier * temp[ro, co])
end
end
end
Both double loops report around four million allocations (total of over 100 MB). This happens with the development version of Julia on Linux and 1.0.1 on Windows. I haven’t tested more.
I can’t figure out why the allocation occurs. There is allocation even when just plainly looping to overwrite one matrix with another
@time begin
for co = 1:N
for ro = 1:N
H[ro, co] = temp[ro, co]
end
end
end
True. Actually the allocating loops are inside a function in the code I’m trying to debug. It was allocating
inside the function. When I pulled it out, it still reproduced the allocating, so I haven’t noticed
the trip up with the global scope.
So it looks like my example is not equivalent to the code inside the function, and something else is going on.
This is what the code inside the function really looks like:
@show typeof(panel2intdof), typeof(temp), typeof(H), typeof(multiplier)
@time begin # DEBUG This loop really does nothing useful. It is there purely for comparison.
@inbounds for co = 1:size(temp, 2)
@inbounds for ro = 1:size(temp, 1)
H[ro, co] = temp[ro, co]
end
end
end # DEBUG
println("After the first loop")
@time begin # DEBUG This loop is the actual work I want to get done
@inbounds for co = 1:size(temp, 2)
renumco = panel2intdof[co]
@inbounds for ro = 1:size(temp, 1)
renumro = panel2intdof[ro]
H[ro, co] = multiplier * temp[renumro, renumco]
end
end
end # DEBUG
println("After the second loop")
I get the following printout
(typeof(panel2intdof), typeof(temp), typeof(H), typeof(multiplier)) = (Array{Int64,1}, Array{Complex{Float64},2}, Array{Complex{Float64},2}, Complex{Float64})
0.010650 seconds
After the first loop
1.486276 seconds (13.43 M allocations: 409.923 MiB, 26.14% gc time)
After the second loop
As you can see, there are many allocations in the second loop, yet the types of operands give me no clue as to why.
That still looks like a type instability of some sort. Call @code_warntype on the function and see what parts get flagged. Note that a type-instability cannot be diagnosed by the run-time types of the variables — the key is that you’ve written things in such a way that Julia isn’t able to predict those run-time types. That’s what @code_warntype will show you.
If you could post the a more complete example that we can run then you’ll get more conclusive answers here.
I cannot reproduce. Are you doing something weird like running on 32bit causing Int64 / Int / Int32 instability?
julia> using Random
julia> N=3000; temp = rand(Complex{Float64}, N,N); H=rand(Complex{Float64}, N,N); multiplier = rand(Complex{Float64}); panel2intdof=shuffle(1:N);
julia> function fff(panel2intdof, temp, H, multiplier)
@show typeof(panel2intdof), typeof(temp), typeof(H), typeof(multiplier)
@time begin # DEBUG This loop really does nothing useful. It is there purely for comparison.
@inbounds for co = 1:size(temp, 2)
@inbounds for ro = 1:size(temp, 1)
H[ro, co] = temp[ro, co]
end
end
end # DEBUG
println("After the first loop")
@time begin # DEBUG This loop is the actual work I want to get done
@inbounds for co = 1:size(temp, 2)
renumco = panel2intdof[co]
@inbounds for ro = 1:size(temp, 1)
renumro = panel2intdof[ro]
H[ro, co] = multiplier * temp[renumro, renumco]
end
end
end # DEBUG
println("After the second loop")
end
Post warm-up:
julia> @time fff(panel2intdof, temp, H, multiplier)
(typeof(panel2intdof), typeof(temp), typeof(H), typeof(multiplier)) = (Array{Int64,1}, Array{Complex{Float64},2}, Array{Complex{Float64},2}, Complex{Float64})
0.030214 seconds
After the first loop
0.040853 seconds
After the second loop
0.071491 seconds (141 allocations: 5.781 KiB)
Turns out I made an (unwarranted) assumption that multiplier was discoverable as complex.
That number was type-unstable.
Thanks for making me to figure it out!
(As a matter of fact I have used @code_warntype on this code, but I missed the important information in the thicket of the printouts.)