I have noticed that a particular calculation in the MixedModels
package unexpectedly allocates a lot of storage. The method in question is around lines 340 to 350 of modelterms.jl
. The calculation involves the columns of a k
by n
matrix where k << n
and a PooledDataArray
which indicates which group the column is part of. There are g
groups. In the code, the n
-vector A.f.refs
will contain the integers 1:g
where g < n
.
The output value is a g
-vector of k
by k
matrices called d
in the code. These are zeroed out then each column is used in a rank-1 update of the corresponding element of d
. Right now the loop is written
for i in eachindex(refs)
## FIXME: A lot of allocation going on here
BLAS.syr!('L', one(T), view(Az, :, i), d[refs[i]])
end
Checking with --track-allocation shows considerable allocation in this loop (which is called many times on a large problem).
0 for i in eachindex(refs)
- ## FIXME: A lot of allocation going on here
1690606080 BLAS.syr!('L', one(T), view(Az, :, i), d[refs[i]])
- end
I don’t understand where the allocation is coming from. I had thought that a view would be lightweight, especially a view of a column in a matrix.
Is there a better way of writing this so as to avoid all the allocation?
I thought I might be able to avoid the allocation problem by using an unsafe_wrap
of an array even though it is, well, unsafe. But it actually made the problem worse
0 refs = A.f.refs
0 Apt = pointer(Az)
0 @inbounds for i in eachindex(refs)
- ## FIXME: A lot of allocation going on here
2817676800 BLAS.syr!('L', one(T), unsafe_wrap(Array, Apt, l), d[refs[i]])
0 Apt += l * sizeof(T)
- end
Actually, I think the “twice as bad” may be because I ran the fit twice and forgot to Profile.clear_malloc_data()
after the first fit.
Ideally view
will be allocation-free some day, but for now the wrapper gets allocated in many cases (https://github.com/JuliaLang/julia/issues/10899). You can try the old ArrayViews package, where the unsafe contiguous view should manage to get optimized away… but it’s not gotten much love lately.
I think the problem was the calls to BLAS.syr! although I am not sure exactly why that would allocate. In any case, the calculation is so simple that I just replaced the high-level calls with explicit loops and that doesn’t allocate.
The reason that I don’t think it was the view
call that was a problem is because I still had the problem with allocation after switching to unsafe_wrap(Array, ...)
Now it looks like
refs = A.f.refs
@inbounds for i in eachindex(refs)
dri = d[refs[i]]
for j in 1:l
Aji = Az[j, i]
for k in 1:l
dri[k, j] += Aji * Az[k, i]
end
end
end
Instead of view
, try using unsafe_aview
from the ArrayViews
package. As long as your code works with Abstract Arrays, this should be non allocating. I am using this until we get non allocating views in the base.
I’ve noticed that the new inliner in Julia 0.7 seems more likely optimize out the construction of the SubArray, you could give that a try.
My current opinion is that it is not the call to view
that is causing the allocations but rather syr!
. Because of the Fortran calling sequence used by the BLAS, I think that all the integers passed to the BLAS subroutine need to be allocated. The ccall is
ccall((@blasfunc($fname), $lib), Void,
(Ptr{UInt8}, Ptr{BlasInt}, Ptr{$elty}, Ptr{$elty},
Ptr{BlasInt}, Ptr{$elty}, Ptr{BlasInt}),
&uplo, &n, &α, x,
&stride(x, 1), A, &max(1,stride(A, 2)))
The point is that this is called many, many times with n = 2
. Even trivial amounts of allocation within the call will add up.
Pretty sure that passing integers to Fortran with &
will not allocate.
I haven’t been able to find another explanation as the allocations persist even when replacing view(...
by unsafe_wrap(Array, ...
As it is, this problem motivated me to rewrite the code to avoid the calls to view
and syr!
so I guess all is good in the end. The tools to check allocations and to profile execution have proven their worth again.
unsafe_wrap
allocate the array.
@dmbates did you try unsafe_aview
?