Alternative to mapslices that does not allocate slices

I’m trying to understand whether there is a non-allocating equivalent of mapslices. I often end up needing to reduce sliced views, for example:

mapslices(sum, m, dims = n)

And I would like to figure out a way that does not allocate all the intermediate slices. This is somewhat trivial for sum as one can do sum(m, dims = ...) directly, but there are many cases where mapslices is needed. For example, if we want to sum and skip missing values, I think one would do:

mapslices(sum∘skipmissing, m, dims = n)

But again, I think this allocates much more than it would be necessary for this usecase.

In the case where the reduction is pairwise, for example:

mapslices(v -> reduce(+, v), m, dims = n)

one can obviate this with reduce(+, m, dims = n), but that again doesn’t work as soon as we need to filter, for example:

mapslices(v -> reduce(+, skipmissing(v)), m, dims = n)

Is there some function like mapslices that obviates this problems?

Bump?

Sorry to bump this post, but in the meantime I couldn’t find an easy solution to the problem above and am really curious whether there are good alternatives to mapslices that I’m missing.

I can’t help you, but I was surprised about this.

I thought that

sum(skipmissing(m), dims=n)

would work, and without allocations, since sum(skipmissing(m)) works, is fast and creates no allocations. Unfortunately,

julia> sum(skipmissing(m), dims=2)
ERROR: MethodError: no method matching sum(::Base.SkipMissing{Array{Union{Missing, Float64},2}}; dims=2)

I don’t know if you are aware of this thread

and the resulting package

should provide what you non-allocating array slices. Maybe @bramtayl can clarify?

There is issue for this, which after 3 years, multiple proposals and PR is still open

I like MLDataUtils.jl for this
It exposes eachobs and obsview for iterators over different dimensions.

obview is a lazy view and allocates nothing (and is unsafe to run, e.g. collect on),
while eachobs allocates a buffer that it reuses.

You can see below that it gives basically the same number of allocations for all the operations.

julia> using MLDataUtils
julia> using BenchmarkTools


julia> m = [ rand() > 0.5 ?  missing : rand()  for ii in 1:10, jj in 1:20, kk in 1:30];

############ sum

julia> @btime sum($m, dims = 2);
  58.299 μs (13 allocations: 3.17 KiB)

julia> @btime mapslices(sum, $m, dims = 2);
  1.070 ms (3044 allocations: 166.47 KiB)
  
julia> @btime map(sum, eachobs($m, ObsDim.Constant{2}()));
  136.505 μs (166 allocations: 8.64 KiB)

julia> @btime map(sum, obsview($m, ObsDim.Constant{2}()));
  70.155 μs (101 allocations: 3.22 KiB)
  
############ sum∘skipmissing

julia> @btime mapslices(sum∘skipmissing, $m, dims = 2);
  1.262 ms (4543 allocations: 108.23 KiB)

julia> @btime map(sum∘skipmissing, eachobs($m, ObsDim.Constant{2}()));
  137.606 μs (246 allocations: 10.05 KiB)

julia> @btime map(sum∘skipmissing, obsview($m, ObsDim.Constant{2}()));
  67.717 μs (43 allocations: 1.83 KiB)

############ v -> reduce(+, skipmissing(v))

julia> @btime mapslices(v -> reduce(+, skipmissing(v)), $m, dims = 2);
  1.285 ms (4543 allocations: 108.23 KiB)

julia> @btime map(v -> reduce(+, skipmissing(v)), eachobs($m, ObsDim.Constant{2}()));
  139.516 μs (246 allocations: 10.05 KiB)

julia> @btime map(v -> reduce(+, skipmissing(v)), obsview($m, ObsDim.Constant{2}()));
  67.685 μs (43 allocations: 1.83 KiB)

I believe JuliennedArrays.jl,
mentioned above;
and possibly SplitApplyCombine.jl
can also do this.
However, I am less familiar with them.

For the specific case of summing while skipping missing values, see my PR:

Regarding the need for a view-based mapslices, I totally agree, since for most operations you don’t want to mutate the inputs. I suggest filing an issue, since the existing ones are not exactly about that.

You could try the SplitApplyCombine package - which was designed for pulling apart data and doing operations on subsets.

In this case you might be interested in splitdimsview?

1 Like

I like MLDataUtils.jl for this

That definitely looks interesting, I’ll go over the package to see what other “goodies” I can take from there. The downside of keeping this there is that the package has “low discoverability” (I’d never think of looking in what seems to be a low-level package for machine learning as I’m not a machine learning practitioner).

I believe JuliennedArrays.jl ,
mentioned above;
and possibly SplitApplyCombine.jl

JuliennedArrays is probably the closest to what I’m after, as suggested above I’ve opened an issue in Julia Base to propose to port some of the functionality in Base to go hand in hand with the already existing allocating mapslices.

1 Like