Using `view` in 0.6 on a ragged array

I have noticed that a particular calculation in the MixedModels package unexpectedly allocates a lot of storage. The method in question is around lines 340 to 350 of modelterms.jl. The calculation involves the columns of a k by n matrix where k << n and a PooledDataArray which indicates which group the column is part of. There are g groups. In the code, the n-vector A.f.refs will contain the integers 1:g where g < n.

The output value is a g-vector of k by k matrices called d in the code. These are zeroed out then each column is used in a rank-1 update of the corresponding element of d. Right now the loop is written

    for i in eachindex(refs)
        ## FIXME: A lot of allocation going on here
        BLAS.syr!('L', one(T), view(Az, :, i), d[refs[i]])
    end

Checking with --track-allocation shows considerable allocation in this loop (which is called many times on a large problem).

        0     for i in eachindex(refs)
        -         ## FIXME: A lot of allocation going on here
1690606080         BLAS.syr!('L', one(T), view(Az, :, i), d[refs[i]])
        -     end

I don’t understand where the allocation is coming from. I had thought that a view would be lightweight, especially a view of a column in a matrix.

Is there a better way of writing this so as to avoid all the allocation?

I thought I might be able to avoid the allocation problem by using an unsafe_wrap of an array even though it is, well, unsafe. But it actually made the problem worse

        0     refs = A.f.refs
        0     Apt = pointer(Az)
        0     @inbounds for i in eachindex(refs)
        -         ## FIXME: A lot of allocation going on here
2817676800         BLAS.syr!('L', one(T), unsafe_wrap(Array, Apt, l), d[refs[i]])
        0         Apt += l * sizeof(T)
        -     end

Actually, I think the “twice as bad” may be because I ran the fit twice and forgot to Profile.clear_malloc_data() after the first fit.

Ideally view will be allocation-free some day, but for now the wrapper gets allocated in many cases (https://github.com/JuliaLang/julia/issues/10899). You can try the old ArrayViews package, where the unsafe contiguous view should manage to get optimized away… but it’s not gotten much love lately.

I think the problem was the calls to BLAS.syr! although I am not sure exactly why that would allocate. In any case, the calculation is so simple that I just replaced the high-level calls with explicit loops and that doesn’t allocate.

The reason that I don’t think it was the view call that was a problem is because I still had the problem with allocation after switching to unsafe_wrap(Array, ...)

Now it looks like

    refs = A.f.refs
    @inbounds for i in eachindex(refs)
        dri = d[refs[i]]
        for j in 1:l
            Aji = Az[j, i]
            for k in 1:l
                dri[k, j] += Aji * Az[k, i]
            end
        end
    end

Instead of view , try using unsafe_aview from the ArrayViews package. As long as your code works with Abstract Arrays, this should be non allocating. I am using this until we get non allocating views in the base.

I’ve noticed that the new inliner in Julia 0.7 seems more likely optimize out the construction of the SubArray, you could give that a try.

My current opinion is that it is not the call to view that is causing the allocations but rather syr!. Because of the Fortran calling sequence used by the BLAS, I think that all the integers passed to the BLAS subroutine need to be allocated. The ccall is

 ccall((@blasfunc($fname), $lib), Void,
     (Ptr{UInt8}, Ptr{BlasInt}, Ptr{$elty}, Ptr{$elty},
      Ptr{BlasInt}, Ptr{$elty}, Ptr{BlasInt}),
      &uplo, &n, &α, x,
      &stride(x, 1), A, &max(1,stride(A, 2)))

The point is that this is called many, many times with n = 2. Even trivial amounts of allocation within the call will add up.

Pretty sure that passing integers to Fortran with & will not allocate.

I haven’t been able to find another explanation as the allocations persist even when replacing view(... by unsafe_wrap(Array, ...

As it is, this problem motivated me to rewrite the code to avoid the calls to view and syr! so I guess all is good in the end. The tools to check allocations and to profile execution have proven their worth again.

unsafe_wrap allocate the array.

@dmbates did you try unsafe_aview ?