Avoiding allocations in `view`s

Is there a way to avoid allocations with view and missing values? If I change the inner loop to sum then there are allocations in both tests.

using BenchmarkTools

X = randn(10, 1000)
Xm = convert(Array{Union{Float64, Missing}}, X)


function colsums(x::Array{T}) where T
    y = Vector{T}(undef, size(x, 2))
    @inbounds for i in 1:size(x, 2)
        xx = view(x, :, i)
        x0 = 0
        for j in eachindex(xx)
            x0 += xx[j]
        end
        y[i] = x0
    end
    return y
end

@btime colsums(X);
@btime colsums(Xm);
julia> @btime colsums(X);
  9.974 μs (1 allocation: 7.94 KiB)

julia> @btime colsums(Xm);
  247.901 μs (20490 allocations: 329.08 KiB)

One option is UnsafeArrays; see e.g. this post: Array views becoming dominant source of memory allocation. The current rule of thumb is that creating an array view will allocate if the view is used as an argument to a non-inlined function or if it is returned from a function.

5 Likes

Your x0 is being initialized as an Int and winds up getting promoted on every iteration. This is definitely causing a lot of extra allocations, though I’m not quite sure why it goes so much more badly wrong with missng.

function colsums(x::Array)
    y = Vector{eltype(x)}(undef, size(x, 2))
    @inbounds for i ∈ 1:size(x, 2)
        xx = view(x, :, i)
        x0 = zero(eltype(x))
        for j ∈ eachindex(xx)
            x0 += xx[j]
        end
        y[i] = x0
    end
    y
end
julia> @btime colsums(Xm);
  11.640 μs (1 allocation: 8.94 KiB)

with your version I get

julia> @btime colsums(Xm);
  340.882 μs (20490 allocations: 329.08 KiB)

This somehow beats vec(sum(Xm, dims=1)) which seems odd. I wonder if it’s worth opening an issue.

julia> @btime vec(sum(X, dims=1));
  4.937 μs (3 allocations: 8.02 KiB)

julia> @btime vec(sum(Xm, dims=1));
  15.836 μs (3 allocations: 8.98 KiB)
2 Likes

O, thanks for spotting this, this mistake was from the MWE only, it didn’t solve my problem. What solved big part of the problem was removing an unneeded where T from some functions.

sum is probably slower because it does pairwise aggregation for better accuracy and therefore has some overhead.

1 Like

sum(xx) is slower than a loop because it isn’t inline, so the compiler heap allocates your views, and because 10 rows is too short for vectorization.

What solved big part of the problem was removing an unneeded where T from some functions.

I keep getting bit by that, and would love to hear if anyone has a good solution.
Once upon a time this triggered an error. I wish that we’re still the case.

A while a go I opened an issue for LanguageServer.jl to issue a warning.

The code I am encountering this problem has some really strange other issues, too (e.g. it reaches the unreachable in julia 1.0, 1.1, 1.2, and master due to some typeinference bug and the only thing that helps is manually inlining everything) and I am really struggling to create a reproducible example.

2 Likes

Cool, following the issue. I should try getting LanguageServer.jl working in my emacs again.
I didn’t try SpaceMacs because I wasn’t familiar with the vim keybindings either (although I’ve been using ergoemacs, which has only slightly different movement keys – maybe I should have gone evil/spacemacs instead), and have things configured in a way I like otherwise.
That mostly means treemacs (which is also integrated into spacemacs), so I’ll probably give it a try sometime.

Having to manually inline everything is worse than having to @inline problem functions, like I do to work around this isssue.

How far does the “everything” go in “manually inlining everything”? I’d hope you don’t have to inline getindex calls, for example.

Luckily not getindex, I have to manually inline all the imported functions from a package which I am extending (which is ~3 layers deep) . The weird/interesting/annoying part is, that it works fine in one case, but not in a very similar one.