Incorrect memory allocation in Julia v0.5

Julia v0.5 has some serious issues regarding memory allocation. In an attempt to optimize my package I realized many inconsistencies:

        - ## Copyright (c) 2015, Júlio Hoffimann Mendes <juliohm@stanford.edu>
        - ##
        - ## Permission to use, copy, modify, and/or distribute this software for any
        - ## purpose with or without fee is hereby granted, provided that the above
        - ## copyright notice and this permission notice appear in all copies.
        - ##
        - ## THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
        - ## WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
        - ## MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
        - ## ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
        - ## WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
        - ## ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
        - ## OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
        - 
        - function boykov_kolmogorov_cut(A::AbstractArray, B::AbstractArray, dir::Symbol)
        -   # permute dimensions so that the algorithm is
        -   # the same for cuts in x, y and z directions
        -   if dir == :x
   149056     A = permutedims(A, [1,2,3])
   149120     B = permutedims(B, [1,2,3])
        0   elseif dir == :y
   140160     A = permutedims(A, [2,1,3])
   140160     B = permutedims(B, [2,1,3])
        0   elseif dir == :z
        0     A = permutedims(A, [3,2,1])
        0     B = permutedims(B, [3,2,1])
        -   end
        - 
   453760   E = abs(A - B)
        - 
        0   mx, my, mz = size(E)
        0   nvox = mx*my*mz
        - 
        -   # compute gradients
   679680   ∇xₐ = similar(E); ∇yₐ = similar(E); ∇zₐ = similar(E)
   679680   ∇xᵦ = similar(E); ∇yᵦ = similar(E); ∇zᵦ = similar(E)
        0   for i=1:mx-1, j=1:my, k=1:mz
   793600     ∇xₐ[i,j,k] = A[i+1,j,k] - A[i,j,k]
   793600     ∇xᵦ[i,j,k] = B[i+1,j,k] - B[i,j,k]
        -   end
        0   for i=1:mx, j=1:my-1, k=1:mz
   858880     ∇yₐ[i,j,k] = A[i,j+1,k] - A[i,j,k]
   858880     ∇yᵦ[i,j,k] = B[i,j+1,k] - B[i,j,k]
        -   end
        0   for i=1:mx, j=1:my, k=1:mz-1
        0     ∇zₐ[i,j,k] = A[i,j,k+1] - A[i,j,k]
        0     ∇zᵦ[i,j,k] = B[i,j,k+1] - B[i,j,k]
        -   end
        0   if mx > 1
    24320     ∇xₐ[mx,:,:] = ∇xₐ[mx-1,:,:]
    24320     ∇xᵦ[mx,:,:] = ∇xᵦ[mx-1,:,:]
        -   end
        0   if my > 1
     8320     ∇yₐ[:,my,:] = ∇yₐ[:,my-1,:]
     8320     ∇yᵦ[:,my,:] = ∇yᵦ[:,my-1,:]
        -   end
        0   if mz > 1
        0     ∇zₐ[:,:,mz] = ∇zₐ[:,:,mz-1]
        0     ∇zᵦ[:,:,mz] = ∇zᵦ[:,:,mz-1]
        -   end
        0   map!(abs, ∇xₐ); map!(abs, ∇yₐ); map!(abs, ∇zₐ)
        0   map!(abs, ∇xᵦ); map!(abs, ∇yᵦ); map!(abs, ∇zᵦ)
        - 
        -   # add source and sink terminals
        0   s = nvox + 1; t = nvox + 2
        - 
        -   # construct graph and capacity matrix
        0   G = DiGraph(nvox+2)
   282880   C = spzeros(nvox+2, nvox+2)
        0   for k=1:mz, j=1:my, i=1:mx-1
        0     c = sub2ind((mx,my,mz), i, j, k)
        0     d = sub2ind((mx,my,mz), i+1, j, k)
        0     add_edge!(G, c, d)
        0     add_edge!(G, d, c)
  2558592     C[c,d] = C[d,c] = (E[c] + E[d]) / (∇xₐ[c] + ∇xₐ[d] + ∇xᵦ[c] + ∇xᵦ[d])
        -   end
        0   for k=1:mz, j=1:my-1, i=1:mx
        0     c = sub2ind((mx,my,mz), i, j, k)
        0     r = sub2ind((mx,my,mz), i, j+1, k)
        0     add_edge!(G, c, r)
        0     add_edge!(G, r, c)
  2725248     C[c,r] = C[r,c] = (E[c] + E[r]) / (∇yₐ[c] + ∇yₐ[r] + ∇yᵦ[c] + ∇yᵦ[r])
        -   end
        0   for k=1:mz-1, j=1:my, i=1:mx
        0     c = sub2ind((mx,my,mz), i, j, k)
        0     o = sub2ind((mx,my,mz), i, j, k+1)
        0     add_edge!(G, c, o)
        0     add_edge!(G, o, c)
        0     C[c,o] = C[o,c] = (E[c] + E[o]) / (∇zₐ[c] + ∇zₐ[o] + ∇zᵦ[c] + ∇zᵦ[o])
        -   end
        0   for k=1:mz, j=1:my
        0     u = sub2ind((mx,my,mz), 1, j, k)
        0     v = sub2ind((mx,my,mz), mx, j, k)
        0     add_edge!(G, s, u)
        0     add_edge!(G, v, t)
        0     C[s,u] = C[v,t] = Inf
        -   end
        - 
        -   # Boykov-Kolmogorov max-flow/min-cut
     4192   _, __, labels = maximum_flow(G, s, t, C, algorithm=BoykovKolmogorovAlgorithm())
        - 
        -   # remove source and sink terminals
        0   labels = labels[1:end-2]
        - 
        -   # cut mask
    10240   M = falses(E)
   188160   M[labels .== 1] = true
   188160   M[labels .== 0] = true
        - 
        -   # permute back to original shape
        0   if dir == :x
    14720     M = permutedims(M, [1,2,3])
        0   elseif dir == :y
    14720     M = permutedims(M, [2,1,3])
        0   elseif dir == :z
        0     M = permutedims(M, [3,2,1])
        -   end
        - 
        0   M
        - end
        - 

As can be seen in the output, various lines of code that only index an existing array are allocating memory. This incorrect memory allocation also happens when I just set values of an array with a mask M[labels .== 1] = true.

Could you please help me fix this issue?

Looks like you are creating the [1,2,3] arrays and the output arrays at every call? For M[labels .== 1] = true. you are allocating the labels .== 1 array.

Hi @kristoffer.carlsson I am more concerned about the loops in which I set gradients. They are simple A[i,j,k] = B[u,v,w] operations and shouldn’t be consuming any memory.

Sorry, I didn’t see that there were more lines in the code and that I could scroll. Have you made sure you have no type instabilities with @code_warntype. I would also check on master because there have been some fixes to the allocation counters and I am not sure they are on 0.5 yet.

Could you provide self-contained reproducers? That would really help fixing your problem.

@nalimilan below is the script.jl I am using:

using ImageQuilting

TI = randn(250,250,1)
iqsim(TI, 62, 62, 1, size(TI)...)
Profile.clear_malloc_data()
iqsim(TI, 62, 62, 1, size(TI)...)

julia --track-allocation=user --inline=yes script.jl

Self contained typically means something where you can get an overview of the whole code being run and pinpoint where the problem is. Running a function defined in a large package sort of defeats that purpose. Basically, leave exactly as much code that is needed to highlight the problem but nothing more.

Note that these options are not expected to work before 0.6.

@kristoffer.carlsson: self contained means reproducible. There is no point is separating a single function from this package because the issue is very clear, it is not my package, it is Julia v0.5 allocating memory incorrectly.

@yuyichao: what options you mean? Correct memory allocation?

I suggest a bug report with a minimum reproducible example in that case.

Assuming you mean tracking allocation incorrectly then the inconsistency caused by inlining is fixed on master (won’t be backported) and there’s already another issue about fixing allocation count on master.

There’s nothing wrong with the allocation afaict. The options to generate coverage and allocation data do not work with inlining on before 0.6.

@yuyichao I tried with --inline=no and the inconsistency is still there. Could you please confirm this is a major issue in Julia v0.5? Why it won’t be backported given the relevancy of the issue?

It is an issue on 0.5. I will not confirm or decide if it’s a major one.

The fix is too big to be backportable.

I believe the issue the @yuyichao is referring to is that allocation tracking doesn’t work in combination with inlining on 0.5, so one needs to turn inlining off for allocation optimization.

Apparently the issue that @juliohm is troubleshooting appears with --inline=no, so it seems like a separate problem.

A few things to check:

  1. Make sure you’re running Profile.clear_malloc_data() after you run your test for the first time so you’re not tracking JIT compilation.
  2. Run code_warntype on the function showing allocation to make sure that all the values involved have inferrable types.
  3. Check for unnecessary type conversions. They can sometimes allocate small amounts of memory, and if you do them a bunch of times it could add up.

How many times is the loop getting run for that test? It might be useful to know the per-loop allocation.

@ssfrr you are 100% correct. The issue I am pointing out here is unrelated to the fact that --inline=yes is buggy in Julia v0.5.

Note that you also need to turn off precompilation (with --compilecache=no, not removing __precompile__ from the code) for --inline=no to work.

  for i=1:mx-1, j=1:my, k=1:mz
    ∇xₐ[i,j,k] = A[i+1,j,k] - A[i,j,k]
    ∇xᵦ[i,j,k] = B[i+1,j,k] - B[i,j,k]
  end

If you are really trying to optimize, perhaps you should reconsider the order of your nested loops?

1 Like

@yuyichao the result is the same with --compilecache=no --inline=no. Huge incorrect memory allocations.

That is a good point @Ralph_Smith, thank you, I will definitely reorder the loops to exploit the cache. However, this optimization trick is not related to the serious memory allocation issue I am having.