Using `view` in 0.6 on a ragged array

question

#1

I have noticed that a particular calculation in the MixedModels package unexpectedly allocates a lot of storage. The method in question is around lines 340 to 350 of modelterms.jl. The calculation involves the columns of a k by n matrix where k << n and a PooledDataArray which indicates which group the column is part of. There are g groups. In the code, the n-vector A.f.refs will contain the integers 1:g where g < n.

The output value is a g-vector of k by k matrices called d in the code. These are zeroed out then each column is used in a rank-1 update of the corresponding element of d. Right now the loop is written

    for i in eachindex(refs)
        ## FIXME: A lot of allocation going on here
        BLAS.syr!('L', one(T), view(Az, :, i), d[refs[i]])
    end

Checking with --track-allocation shows considerable allocation in this loop (which is called many times on a large problem).

        0     for i in eachindex(refs)
        -         ## FIXME: A lot of allocation going on here
1690606080         BLAS.syr!('L', one(T), view(Az, :, i), d[refs[i]])
        -     end

I don’t understand where the allocation is coming from. I had thought that a view would be lightweight, especially a view of a column in a matrix.

Is there a better way of writing this so as to avoid all the allocation?


#2

I thought I might be able to avoid the allocation problem by using an unsafe_wrap of an array even though it is, well, unsafe. But it actually made the problem worse

        0     refs = A.f.refs
        0     Apt = pointer(Az)
        0     @inbounds for i in eachindex(refs)
        -         ## FIXME: A lot of allocation going on here
2817676800         BLAS.syr!('L', one(T), unsafe_wrap(Array, Apt, l), d[refs[i]])
        0         Apt += l * sizeof(T)
        -     end

#3

Actually, I think the “twice as bad” may be because I ran the fit twice and forgot to Profile.clear_malloc_data() after the first fit.


#4

Ideally view will be allocation-free some day, but for now the wrapper gets allocated in many cases (https://github.com/JuliaLang/julia/issues/10899). You can try the old ArrayViews package, where the unsafe contiguous view should manage to get optimized away… but it’s not gotten much love lately.


#5

I think the problem was the calls to BLAS.syr! although I am not sure exactly why that would allocate. In any case, the calculation is so simple that I just replaced the high-level calls with explicit loops and that doesn’t allocate.

The reason that I don’t think it was the view call that was a problem is because I still had the problem with allocation after switching to unsafe_wrap(Array, ...)


#6

Now it looks like

    refs = A.f.refs
    @inbounds for i in eachindex(refs)
        dri = d[refs[i]]
        for j in 1:l
            Aji = Az[j, i]
            for k in 1:l
                dri[k, j] += Aji * Az[k, i]
            end
        end
    end

#7

Instead of view , try using unsafe_aview from the ArrayViews package. As long as your code works with Abstract Arrays, this should be non allocating. I am using this until we get non allocating views in the base.


#8

I’ve noticed that the new inliner in Julia 0.7 seems more likely optimize out the construction of the SubArray, you could give that a try.


#9

My current opinion is that it is not the call to view that is causing the allocations but rather syr!. Because of the Fortran calling sequence used by the BLAS, I think that all the integers passed to the BLAS subroutine need to be allocated. The ccall is

 ccall((@blasfunc($fname), $lib), Void,
     (Ptr{UInt8}, Ptr{BlasInt}, Ptr{$elty}, Ptr{$elty},
      Ptr{BlasInt}, Ptr{$elty}, Ptr{BlasInt}),
      &uplo, &n, &α, x,
      &stride(x, 1), A, &max(1,stride(A, 2)))

The point is that this is called many, many times with n = 2. Even trivial amounts of allocation within the call will add up.


#10

Pretty sure that passing integers to Fortran with & will not allocate.


#11

I haven’t been able to find another explanation as the allocations persist even when replacing view(... by unsafe_wrap(Array, ...

As it is, this problem motivated me to rewrite the code to avoid the calls to view and syr! so I guess all is good in the end. The tools to check allocations and to profile execution have proven their worth again.


#12

unsafe_wrap allocate the array.


#13

@dmbates did you try unsafe_aview ?