Memory allocations are not very understandable

I am having a hard time understanding how memory is managed by Julia.

Why do the following lines allocate memory at all?

        - function winsMilestone(topSide, bottomSide, justPlayedCard::Player)
827581056     if (countnz(topSide) < 3  ||  countnz(bottomSide) < 3)
        0          return none
        -     end
       end

Knowing that topSide and bottomSide are of the same type and are views on a UInt8 Array?

1 Like

That looks like it might be allocation caused by compilation itself. Did you follow the instructions to do Profile.clear_malloc_data() from https://docs.julialang.org/en/stable/manual/profile/#Memory-allocation-analysis-1 ?

2 Likes

Yes, I did.

Hm, that’s really interesting. It will be easier to help if you can post a full reproducible example. Can you create a standalone demo of the issue?

5 Likes

Here is a minimal example:

function winsMilestone(topSide, bottomSide)
    a = 0

    if (countnz(topSide) < 3  ||  countnz(bottomSide) < 3)
         return 1
    end

    return 2
end

function test(n_iter)
    A = zeros(UInt8,(7, 9))
    for i=1:n_iter
        for j=1:9
            @views winsMilestone(A[1:3, j], A[4:7, j])
        end
    end
end

test(1)

Profile.clear_malloc_data()
test(1000)

And with memory annotations:

        - function winsMilestone(topSide, bottomSide)
        -     a = 0
        - 
  1152000     if (countnz(topSide) < 3  ||  countnz(bottomSide) < 3)
        0          return 1
        -     end
        - 
        0     return 2
        - end
        - 
        - function test(n_iter)
      304     A = zeros(UInt8,(7, 9))
        0     for i=1:n_iter
        0         for j=1:9
        0             @views winsMilestone(A[1:3, j], A[4:7, j])
        -         end
        -     end
        - end
        - 
        - test(1)
        - 
        - Profile.clear_malloc_data()
        - test(1000)

The line a=0 is to show that memory is not associated with the first line of the function.
The problem is probably not related to countnz since if the tested expression in the if by the corresponding disjunction on all the elements of the array, the problem is the same.

Thanks.

I think it’s just the allocation tracker misattributing the allocation. If you create the views before the call, then winsMilestone shows no allocation:

function winsMilestone(topSide, bottomSide)
    a = 0

    if (countnz(topSide) < 3  ||  countnz(bottomSide) < 3)
         return 1
    end

    return 2
end

function test(n_iter)
    A = zeros(UInt8,(7, 9))
    for i=1:n_iter
        for j=1:9
            A1 = @view A[1:3, j]
            A2 = @view A[4:7, j]
            winsMilestone(A1, A2)
        end
    end
end

test(1)

Profile.clear_malloc_data()
test(1000)

you end up with the *.mem output:

        - function winsMilestone(topSide, bottomSide)
        -     a = 0
        -
        0     if (countnz(topSide) < 3  ||  countnz(bottomSide) < 3)
        0          return 1
        -     end
        -
        0     return 2
        - end
        -
        - function test(n_iter)
      304     A = zeros(UInt8,(7, 9))
        0     for i=1:n_iter
        0         for j=1:9
   576000             A1 = @view A[1:3, j]
   576000             A2 = @view A[4:7, j]
        0             winsMilestone(A1, A2)
        -         end
        -     end
        - end
        -
        - test(1)
        -
        - Profile.clear_malloc_data()
        - test(1000)
        -
        -
3 Likes

Why would a view allocate so much memory?
Also, why is A allocated using 304 bytes when the actual data it holds is 63 bytes (plus size and type information)?

Thanks,

Why would a view allocate so much memory?

Because it’s not a zero-cost abstraction, and the view object needs space to describe what the view is covering.

julia> A = zeros(UInt8, 7, 9);

julia> a = view(A, 1:3, 1);

julia> sizeof(a)
48

julia> fieldnames(a)
4-element Array{Symbol,1}:
 :parent 
 :indexes
 :offset1
 :stride1

julia> typeof(a)
SubArray{UInt8,1,Array{UInt8,2},Tuple{UnitRange{Int64},Int64},true}

In your example, 576,000 bytes / 9,000 iterations = 64 bytes whereas sizeof returns 48 bytes, but the difference is probably a matter of the underlying memory manager aligning the objects to word boundaries.

Also, why is A allocated using 304 bytes when the actual data it holds is 63 bytes (plus size and type information)?

Probably type information plus memory alignment, but you’ll probably need to dig into the C internals (or have someone who already knows the anwer comment) to verify if that accounts for all of the difference.

FWIW, allocation of views should in 0.7 be elided much more frequently than in 0.6.

1 Like

@jmert, is the only possible optimization to inline winsMilestone by end and doing the disjunction directly on the A array?

@kristoffer.carlsson, is it already implemented?

Yes it just got merged. You might need to @inline winsMilestone but I am not sure.

Maybe Julia is just not a match for my needs.

[Edit] But I wish it is not the case.

Testing Julia from master branch (easy compilation BTW), I have:

        - using Profile
        - 
        - @inline function winsMilestone(topSide, bottomSide)
        -     a = 0
        - 
        0     if (count(!iszero, topSide) < 3  ||  count(!iszero, bottomSide) < 3)
        0          return 1
        -     end
        - 
        0     return 2
        - end
        - 
        - function test(n_iter)
        0     A = zeros(UInt8,(7, 9))
        0     top_idx = collect(1:3)
        0     bottom_idx = collect(4:7)
        0     for i=1:n_iter
        0         for j=1:9
        0             A1 = @view A[top_idx, j]
        0             A2 = @view A[bottom_idx, j]
        0             winsMilestone(A1, A2)
        -         end
        -     end
        - end
        - 
        - test(1)
        - 
        - Profile.clear_malloc_data()
        - test(1000)

It seems a bit odd. Isn’t it?

@time also prints the number of allocations, you can compare to that 0.6.

julia master:

0.000153 seconds (18.13 k allocations: 572.787 KiB)

julia 0.6:

0.000870 seconds (54.09 k allocations: 1.929 MiB)

It is better, indeed (I don’t know if the two version where compiled with the same options).

Is memory allocation tracking broken with julia 0.7?

On today’s master, with the deprecation fixed and an @inline added it’s fixed:

julia> function test(n_iter)
           A = zeros(UInt8,(7, 9))
           for i=1:n_iter
               for j=1:9
                   A1 = @view A[1:3, j]
                   A2 = @view A[4:7, j]
                   winsMilestone(A1, A2)
               end
           end
       end                                                                                                                                                               
test (generic function with 1 method)                                                                                                                                    

julia> @inline function winsMilestone(topSide, bottomSide)
           a = 0

           if (count(!iszero, topSide) < 3  ||  count(!iszero, bottomSide) < 3)
                return 1
           end

           return 2
       end
winsMilestone (generic function with 1 method)

julia> @time test(1000)        # warum-up                                                                                                                                          
  0.085711 seconds (132.28 k allocations: 7.346 MiB, 7.89% gc time)                                                                                                      

julia> @time test(1000)                                                                                                                                                  
  0.000033 seconds (5 allocations: 304 bytes)                                                                                                                            
2 Likes

I can reproduce it. I got tricked by @time annotation that also need to be pre-compiled.

Anyway the generated mem file is now wrong since the 304 bytes are not annoted when A is bound:

        - using Profile
        - 
        - @inline function winsMilestone(topSide, bottomSide)
        -     a = 0
        - 
        0     if (count(!iszero, topSide) < 3  ||  count(!iszero, bottomSide) < 3)
        0          return 1
        -     end
        - 
        0     return 2
        - end
        - 
        - function test(n_iter)
        0     A = zeros(UInt8,(7, 9))
        -     #top_idx = collect(1:3)
        -     #bottom_idx = collect(4:7)
        0     for i=1:n_iter
        0         for j=1:9
        -             #A1 = @view A[top_idx, j]
        -             #A2 = @view A[bottom_idx, j]
        0             @views winsMilestone(A[1:3, j], A[4:7, j])
        -         end
        -     end
        - end
        - 
        - @time test(1)
        - 
        - Profile.clear_malloc_data()
        - @time test(1000)

It will make code harder to debug.

I think these 5 allocations are from the the stuff that @time expands to, which will be in global scope.

Try

f() = @time(1000)
f()
f()
julia> f() = @time(1000)
f (generic function with 1 method)

julia> f()
  0.000000 seconds
1000

julia> f()
  0.000000 seconds
1000

Don’t they come from the call to zeros?

Since you don’t return anything, everything is probably optimized away.

edit; Or wait, you will return something from the last evaluation in the for loop I guess.