Avoiding memory allocation for vector operations


#1

Hi everyone,

I expected this to be a trivial optimization, but am struggling with writing vectorized code that doesn’t allocation memory. Consider the following code:

function gram_schmidt1!{T <: Real}(U::Matrix{T})
    m = size(U, 2)
    @inbounds for k = 1:m
        uk = view(U,:,k)
        @inbounds for j = 1:k-1
            uk .-= (U[:,j] ⋅ uk) .* U[:,j]
        end
        uk ./= norm(uk)
    end
end

Then I get

using BenchmarkTools
N = 100
A = randn(N, N)
@btime gram_schmidt1!($A)
  6.859 ms (19900 allocations: 8.62 MiB)

which results in a surprisingly high number of allocations.

If I instead use a de-vectorized version,

function gram_schmidt2!{T <: Real}(U::Matrix{T})
    n, m = size(U)
    @assert n ≥ 1
    @inbounds for k = 1:m
        for j = 1:k-1
            dotjk = U[1,j] * U[1,k]
            @simd for i = 2:n
                dotjk += U[i,j] * U[i,k]
            end
            @simd for i = 1:n
                U[i,k] -= dotjk * U[i,j]
            end
        end
        uknorm = abs2(U[1,k])
        @simd for i = 2:n
            uknorm += abs2(U[i,k])
        end
        uknorm = √(uknorm)
        @simd for i = 1:n
            U[i,k] /= uknorm
        end
    end
end

I get

@btime gram_schmidt2!($A)
  287.653 μs (0 allocations: 0 bytes)

both on Julia v0.6.3.

I played around with benchmarking and memory allocation analysis, and even the dot product seems to allocate memory. Am I doing something fundamentally wrong here?


#2

The U[:,j] should be a view instead, but other than that your original code is ok. Doing that on 0.7 results in much more acceptable allocations (~200kB). I haven’t looked at the remaining allocations, but I suspect it’s just that the compiler is currently not capable of stack allocating wrappers of objects containing heap references (such as views) - at least across an inlining boundary. If you’re up for it, you may want to add this benchmark (the view version) to https://github.com/JuliaCI/BaseBenchmarks.jl to make sure it gets tracked as the compiler improves.


#3

OK, great! I was worried that I am doing something wrong. I’m not yet ready to move my research code to 0.7/1.0, and so might stick with the de-vectorized version for now.

Just to clarify, with making the U[:,j] a view, do you mean

function gram_schmidt1!{T <: Real}(U::Matrix{T})
    m = size(U, 2)
    @inbounds for k = 1:m
        uk = view(U,:,k)
        @inbounds for j = 1:k-1
            uj = view(U,:,j)
            uk .-= (uj ⋅ uk) .* uj
        end
        uk ./= norm(uk)
    end
end

This one actually significantly reduces memory allocations, even on 0.6.3:

using BenchmarkTools
N = 100
A = randn(N, N)
@btime gram_schmidt1!($A)
  6.069 ms (5050 allocations: 236.72 KiB)

Nonetheless, it isn’t anywhere close to the performance of the de-vectorized version.

Either way, I’d be happy to add it to BaseBenchmarks. Which benchmark group would be the adequate one?


#4

For what is worth, you are making m*(m-1)/2 + m loops = 5050 (for m=100) for which you make 5050 allocations with the views.


#5

Indeed. I didn’t expect matrix slices to cause allocations. Knowing that they do (at least in v0.6.3) clarifies the above:

julia> using BenchmarkTools
julia> a = rand(100);
julia> b = rand(100);
julia> A = rand(100,100);
julia> @btime dot($a, $b)
  45.971 ns (0 allocations: 0 bytes)
julia> @btime dot($A[:,1], $b)
  118.432 ns (2 allocations: 912 bytes)

#6

There exist “unsafe views” defined by packages that are guaranteed not to allocate but are unsafe in the sense that they don’t keep the parent array alive.

One such is https://github.com/oschulz/UnsafeArrays.jl and if we use it (0.7) and also work around a slow fallback that needs to be fixed in the package (https://github.com/oschulz/UnsafeArrays.jl/issues/3)

we get

julia> using LinearAlgebra, UnsafeArrays, BenchmarkTools

julia> Base.mightalias(A::UnsafeArray, B::UnsafeArray) = false

julia> function gram_schmidt1!(U::Matrix{T}) where {T <: Real}
           m = size(U, 2)
           @inbounds for k = 1:m
               uk = uview(U,:,k)
               @inbounds for j = 1:k-1
                   uj = uview(U,:,j)
                   uk .-= (uj ⋅ uk) .* uj
               end
               uk ./= norm(uk)
           end
       end
gram_schmidt1! (generic function with 1 method)

julia> N = 100;

julia> A = randn(N, N);

julia> @btime gram_schmidt1!($A);
  294.633 μs (0 allocations: 0 bytes)

compared to

julia> @btime gram_schmidt2!($A);
  276.199 μs (0 allocations: 0 bytes)

#7

Looks good. As in my case the views are guaranteed to be safe, I might try this instead.

Given that a view is only safe if it keeps the parent object alive, does this mean that we can’t expect to get (safe) views that don’t allocate memory?


#8

It is worth pointing out that you can protect the arrays. UnsafeArrays provides a convenient macro; from their Readme:

using UnsafeArrays

@uviews A begin
    colnorms!(dest, A)
end

@uviews protects the original array A from GC, so the above is safe as long as the original array is not reallocated (via resize!, etc.) while the scope of @uviews is active.

The macro also replaces regular views with uviews:

julia> function colnorms!(dest::AbstractVector, A::AbstractMatrix)
           for i in axes(A, 2)
               dest[i] = norm(view(A, :, i))
           end
           dest
       end

julia> A = rand(50, 100000);

julia> dest = similar(A, size(A, 2));

julia> @benchmark colnorms!($dest, $A)
BenchmarkTools.Trial: 
  memory estimate:  4.58 MiB
  allocs estimate:  100000
  --------------
  minimum time:     4.436 ms (0.00% GC)
  median time:      4.527 ms (0.00% GC)
  mean time:        4.848 ms (6.03% GC)
  maximum time:     7.441 ms (20.70% GC)
  --------------
  samples:          1030
  evals/sample:     1

julia> function colnormsu!(dest, A)
           @uviews A begin
               colnorms!(dest, A)
           end
       end
colnormsu! (generic function with 1 method)

julia> @benchmark colnormsu!($dest, $A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.037 ms (0.00% GC)
  median time:      4.050 ms (0.00% GC)
  mean time:        4.056 ms (0.00% GC)
  maximum time:     4.326 ms (0.00% GC)
  --------------
  samples:          1230
  evals/sample:     1

#9

If @uviews protects the original array, is there any reason to ever not use @uviews (as opposed to @views)?


#10

The view might survive the scope (e.g. you put in in a struct) and the parent array can get reallocated (resize!).


#11

If @uviews protects the original array, is there any reason to ever not use @uviews (as opposed to @views)

Sorry for joining in so late. Yes, I recommend to always use @uviews. The uview() function exists only because I came up with the idea the macro later on during the development of UnsafeArrays.jl. @uviews will protect the original array from GC, both with Julia v0.6 and v0.7.


#12

The view might survive the scope (e.g. you put in in a struct) and the parent array can get reallocated (resize!).

Yes, please keep that in mind - even @uviews can’t protect you against that.

@uviews uses the same name for the unsafe view as the original array, it makes it (intentionally) hard to access (and resize, etc.) the original array within the scope of the macro.

But it’s true, the original array may be accessible in some other way. UnsafeArrays.jl should only be used if you can keep access to the array in question fully under control.

An @kristoffer.carlsson is right - never, ever let the unsafe array escape the scope of @uviews!