Allocation puzzler


#1

I have a bit of a puzzle here. I simply loop through arrays and assign either at multiple of one array (with permutation of the rows and columns), or just assign a multiple of one array to the other. Both versions allocate like crazy.

N = 1000;
multiplier = 2.0 + 1.0im;
temp = rand(Complex{Float64}, N, N);
H = rand(Complex{Float64}, N, N);
panel2intdof = vec(collect(1:N));
@time begin
    for co = 1:size(temp, 2)
        renumco = panel2intdof[co]
        for ro = 1:size(temp, 1)
            renumro = panel2intdof[ro]
            H[ro, co] = multiplier * temp[renumro, renumco]
        end
    end
end
@time begin
    for co = 1:size(temp, 2)
        for ro = 1:size(temp, 1)
            H[ro, co] = (multiplier * temp[ro, co])
        end
    end
end

Both double loops report around four million allocations (total of over 100 MB). This happens with the development version of Julia on Linux and 1.0.1 on Windows. I haven’t tested more.

I can’t figure out why the allocation occurs. There is allocation even when just plainly looping to overwrite one matrix with another

@time begin
    for co = 1:N
        for ro = 1:N
            H[ro, co] = temp[ro, co]
        end
    end
end

This allocates 76 MB.


#2

Don’t benchmark in global scope :slight_smile:

julia> f(N, H, temp) = @time begin
           for co = 1:N
               for ro = 1:N
                   H[ro, co] = temp[ro, co]
               end
           end
       end

julia> f(N, H, temp)
  0.005926 seconds

#3

True. Actually the allocating loops are inside a function in the code I’m trying to debug. It was allocating
inside the function. When I pulled it out, it still reproduced the allocating, so I haven’t noticed
the trip up with the global scope.

So it looks like my example is not equivalent to the code inside the function, and something else is going on.

I will have to try to come up with a better MWE.


#4

This is what the code inside the function really looks like:

        @show typeof(panel2intdof), typeof(temp), typeof(H), typeof(multiplier)
        @time begin # DEBUG This loop really does nothing useful. It is there purely for comparison.
        @inbounds for co = 1:size(temp, 2)
            @inbounds for ro = 1:size(temp, 1)
                H[ro, co] = temp[ro, co]
            end
        end
        end # DEBUG
        println("After the first loop")
        @time begin # DEBUG This loop is the actual work I want to get done
        @inbounds for co = 1:size(temp, 2)
            renumco = panel2intdof[co]
            @inbounds for ro = 1:size(temp, 1)
                renumro = panel2intdof[ro]
                H[ro, co] = multiplier * temp[renumro, renumco]
            end
        end
        end # DEBUG
        println("After the second loop")

I get the following printout

(typeof(panel2intdof), typeof(temp), typeof(H), typeof(multiplier)) = (Array{Int64,1}, Array{Complex{Float64},2}, Array{Complex{Float64},2}, Complex{Float64})                                                                
  0.010650 seconds                                                        
After the first loop                                                      
  1.486276 seconds (13.43 M allocations: 409.923 MiB, 26.14% gc time)     
After the second loop    

As you can see, there are many allocations in the second loop, yet the types of operands give me no clue as to why.


#5

That still looks like a type instability of some sort. Call @code_warntype on the function and see what parts get flagged. Note that a type-instability cannot be diagnosed by the run-time types of the variables — the key is that you’ve written things in such a way that Julia isn’t able to predict those run-time types. That’s what @code_warntype will show you.

If you could post the a more complete example that we can run then you’ll get more conclusive answers here.


#6

I cannot reproduce. Are you doing something weird like running on 32bit causing Int64 / Int / Int32 instability?

julia> using Random
julia> N=3000; temp = rand(Complex{Float64}, N,N); H=rand(Complex{Float64}, N,N); multiplier = rand(Complex{Float64}); panel2intdof=shuffle(1:N);
julia> function fff(panel2intdof, temp, H, multiplier)
               @show typeof(panel2intdof), typeof(temp), typeof(H), typeof(multiplier)
               @time begin # DEBUG This loop really does nothing useful. It is there purely for comparison.
               @inbounds for co = 1:size(temp, 2)
                   @inbounds for ro = 1:size(temp, 1)
                       H[ro, co] = temp[ro, co]
                   end
               end
               end # DEBUG
               println("After the first loop")
               @time begin # DEBUG This loop is the actual work I want to get done
               @inbounds for co = 1:size(temp, 2)
                   renumco = panel2intdof[co]
                   @inbounds for ro = 1:size(temp, 1)
                       renumro = panel2intdof[ro]
                       H[ro, co] = multiplier * temp[renumro, renumco]
                   end
               end
               end # DEBUG
               println("After the second loop")
               end

Post warm-up:

julia> @time fff(panel2intdof, temp, H, multiplier)
(typeof(panel2intdof), typeof(temp), typeof(H), typeof(multiplier)) = (Array{Int64,1}, Array{Complex{Float64},2}, Array{Complex{Float64},2}, Complex{Float64})
  0.030214 seconds
After the first loop
  0.040853 seconds
After the second loop
  0.071491 seconds (141 allocations: 5.781 KiB)

#7

Turns out I made an (unwarranted) assumption that multiplier was discoverable as complex.
That number was type-unstable.
Thanks for making me to figure it out!

(As a matter of fact I have used @code_warntype on this code, but I missed the important information in the thicket of the printouts.)