Allocation puzzler

I have a bit of a puzzle here. I simply loop through arrays and assign either at multiple of one array (with permutation of the rows and columns), or just assign a multiple of one array to the other. Both versions allocate like crazy.

N = 1000;
multiplier = 2.0 + 1.0im;
temp = rand(Complex{Float64}, N, N);
H = rand(Complex{Float64}, N, N);
panel2intdof = vec(collect(1:N));
@time begin
    for co = 1:size(temp, 2)
        renumco = panel2intdof[co]
        for ro = 1:size(temp, 1)
            renumro = panel2intdof[ro]
            H[ro, co] = multiplier * temp[renumro, renumco]
        end
    end
end
@time begin
    for co = 1:size(temp, 2)
        for ro = 1:size(temp, 1)
            H[ro, co] = (multiplier * temp[ro, co])
        end
    end
end

Both double loops report around four million allocations (total of over 100 MB). This happens with the development version of Julia on Linux and 1.0.1 on Windows. I haven’t tested more.

I can’t figure out why the allocation occurs. There is allocation even when just plainly looping to overwrite one matrix with another

@time begin
    for co = 1:N
        for ro = 1:N
            H[ro, co] = temp[ro, co]
        end
    end
end

This allocates 76 MB.

Don’t benchmark in global scope :slight_smile:

julia> f(N, H, temp) = @time begin
           for co = 1:N
               for ro = 1:N
                   H[ro, co] = temp[ro, co]
               end
           end
       end

julia> f(N, H, temp)
  0.005926 seconds
1 Like

True. Actually the allocating loops are inside a function in the code I’m trying to debug. It was allocating
inside the function. When I pulled it out, it still reproduced the allocating, so I haven’t noticed
the trip up with the global scope.

So it looks like my example is not equivalent to the code inside the function, and something else is going on.

I will have to try to come up with a better MWE.

This is what the code inside the function really looks like:

        @show typeof(panel2intdof), typeof(temp), typeof(H), typeof(multiplier)
        @time begin # DEBUG This loop really does nothing useful. It is there purely for comparison.
        @inbounds for co = 1:size(temp, 2)
            @inbounds for ro = 1:size(temp, 1)
                H[ro, co] = temp[ro, co]
            end
        end
        end # DEBUG
        println("After the first loop")
        @time begin # DEBUG This loop is the actual work I want to get done
        @inbounds for co = 1:size(temp, 2)
            renumco = panel2intdof[co]
            @inbounds for ro = 1:size(temp, 1)
                renumro = panel2intdof[ro]
                H[ro, co] = multiplier * temp[renumro, renumco]
            end
        end
        end # DEBUG
        println("After the second loop")

I get the following printout

(typeof(panel2intdof), typeof(temp), typeof(H), typeof(multiplier)) = (Array{Int64,1}, Array{Complex{Float64},2}, Array{Complex{Float64},2}, Complex{Float64})                                                                
  0.010650 seconds                                                        
After the first loop                                                      
  1.486276 seconds (13.43 M allocations: 409.923 MiB, 26.14% gc time)     
After the second loop    

As you can see, there are many allocations in the second loop, yet the types of operands give me no clue as to why.

That still looks like a type instability of some sort. Call @code_warntype on the function and see what parts get flagged. Note that a type-instability cannot be diagnosed by the run-time types of the variables — the key is that you’ve written things in such a way that Julia isn’t able to predict those run-time types. That’s what @code_warntype will show you.

If you could post the a more complete example that we can run then you’ll get more conclusive answers here.

1 Like

I cannot reproduce. Are you doing something weird like running on 32bit causing Int64 / Int / Int32 instability?

julia> using Random
julia> N=3000; temp = rand(Complex{Float64}, N,N); H=rand(Complex{Float64}, N,N); multiplier = rand(Complex{Float64}); panel2intdof=shuffle(1:N);
julia> function fff(panel2intdof, temp, H, multiplier)
               @show typeof(panel2intdof), typeof(temp), typeof(H), typeof(multiplier)
               @time begin # DEBUG This loop really does nothing useful. It is there purely for comparison.
               @inbounds for co = 1:size(temp, 2)
                   @inbounds for ro = 1:size(temp, 1)
                       H[ro, co] = temp[ro, co]
                   end
               end
               end # DEBUG
               println("After the first loop")
               @time begin # DEBUG This loop is the actual work I want to get done
               @inbounds for co = 1:size(temp, 2)
                   renumco = panel2intdof[co]
                   @inbounds for ro = 1:size(temp, 1)
                       renumro = panel2intdof[ro]
                       H[ro, co] = multiplier * temp[renumro, renumco]
                   end
               end
               end # DEBUG
               println("After the second loop")
               end

Post warm-up:

julia> @time fff(panel2intdof, temp, H, multiplier)
(typeof(panel2intdof), typeof(temp), typeof(H), typeof(multiplier)) = (Array{Int64,1}, Array{Complex{Float64},2}, Array{Complex{Float64},2}, Complex{Float64})
  0.030214 seconds
After the first loop
  0.040853 seconds
After the second loop
  0.071491 seconds (141 allocations: 5.781 KiB)
1 Like

Turns out I made an (unwarranted) assumption that multiplier was discoverable as complex.
That number was type-unstable.
Thanks for making me to figure it out!

(As a matter of fact I have used @code_warntype on this code, but I missed the important information in the thicket of the printouts.)