Alternative to mapslices that does not allocate slices

piever · August 21, 2018, 10:28am

I’m trying to understand whether there is a non-allocating equivalent of mapslices. I often end up needing to reduce sliced views, for example:

mapslices(sum, m, dims = n)

And I would like to figure out a way that does not allocate all the intermediate slices. This is somewhat trivial for sum as one can do sum(m, dims = ...) directly, but there are many cases where mapslices is needed. For example, if we want to sum and skip missing values, I think one would do:

mapslices(sum∘skipmissing, m, dims = n)

But again, I think this allocates much more than it would be necessary for this usecase.

In the case where the reduction is pairwise, for example:

mapslices(v -> reduce(+, v), m, dims = n)

one can obviate this with reduce(+, m, dims = n), but that again doesn’t work as soon as we need to filter, for example:

mapslices(v -> reduce(+, skipmissing(v)), m, dims = n)

Is there some function like mapslices that obviates this problems?

piever · September 5, 2018, 12:59pm

Bump?

Sorry to bump this post, but in the meantime I couldn’t find an easy solution to the problem above and am really curious whether there are good alternatives to mapslices that I’m missing.

DNF · September 5, 2018, 1:30pm

I can’t help you, but I was surprised about this.

I thought that

sum(skipmissing(m), dims=n)

would work, and without allocations, since sum(skipmissing(m)) works, is fast and creates no allocations. Unfortunately,

julia> sum(skipmissing(m), dims=2)
ERROR: MethodError: no method matching sum(::Base.SkipMissing{Array{Union{Missing, Float64},2}}; dims=2)

fabiangans · September 5, 2018, 2:17pm

I don’t know if you are aware of this thread

and the resulting package

should provide what you non-allocating array slices. Maybe @bramtayl can clarify?

Azamat · September 11, 2018, 10:25pm

There is issue for this, which after 3 years, multiple proposals and PR is still open
https://github.com/JuliaLang/julia/issues/14491

oxinabox · September 12, 2018, 6:06am

I like MLDataUtils.jl for this
It exposes eachobs and obsview for iterators over different dimensions.

obview is a lazy view and allocates nothing (and is unsafe to run, e.g. collect on),
while eachobs allocates a buffer that it reuses.

You can see below that it gives basically the same number of allocations for all the operations.

julia> using MLDataUtils
julia> using BenchmarkTools


julia> m = [ rand() > 0.5 ?  missing : rand()  for ii in 1:10, jj in 1:20, kk in 1:30];

############ sum

julia> @btime sum($m, dims = 2);
  58.299 μs (13 allocations: 3.17 KiB)

julia> @btime mapslices(sum, $m, dims = 2);
  1.070 ms (3044 allocations: 166.47 KiB)
  
julia> @btime map(sum, eachobs($m, ObsDim.Constant{2}()));
  136.505 μs (166 allocations: 8.64 KiB)

julia> @btime map(sum, obsview($m, ObsDim.Constant{2}()));
  70.155 μs (101 allocations: 3.22 KiB)
  
############ sum∘skipmissing

julia> @btime mapslices(sum∘skipmissing, $m, dims = 2);
  1.262 ms (4543 allocations: 108.23 KiB)

julia> @btime map(sum∘skipmissing, eachobs($m, ObsDim.Constant{2}()));
  137.606 μs (246 allocations: 10.05 KiB)

julia> @btime map(sum∘skipmissing, obsview($m, ObsDim.Constant{2}()));
  67.717 μs (43 allocations: 1.83 KiB)

############ v -> reduce(+, skipmissing(v))

julia> @btime mapslices(v -> reduce(+, skipmissing(v)), $m, dims = 2);
  1.285 ms (4543 allocations: 108.23 KiB)

julia> @btime map(v -> reduce(+, skipmissing(v)), eachobs($m, ObsDim.Constant{2}()));
  139.516 μs (246 allocations: 10.05 KiB)

julia> @btime map(v -> reduce(+, skipmissing(v)), obsview($m, ObsDim.Constant{2}()));
  67.685 μs (43 allocations: 1.83 KiB)

I believe JuliennedArrays.jl,
mentioned above;
and possibly SplitApplyCombine.jl
can also do this.
However, I am less familiar with them.

nalimilan · September 12, 2018, 7:43am

For the specific case of summing while skipping missing values, see my PR:
https://github.com/JuliaLang/julia/pull/28027

Regarding the need for a view-based mapslices, I totally agree, since for most operations you don’t want to mutate the inputs. I suggest filing an issue, since the existing ones are not exactly about that.

andyferris · September 12, 2018, 10:37am

You could try the SplitApplyCombine package - which was designed for pulling apart data and doing operations on subsets.

In this case you might be interested in splitdimsview?

piever · September 12, 2018, 1:28pm

I like MLDataUtils.jl for this

That definitely looks interesting, I’ll go over the package to see what other “goodies” I can take from there. The downside of keeping this there is that the package has “low discoverability” (I’d never think of looking in what seems to be a low-level package for machine learning as I’m not a machine learning practitioner).

I believe JuliennedArrays.jl ,
mentioned above;
and possibly SplitApplyCombine.jl

JuliennedArrays is probably the closest to what I’m after, as suggested above I’ve opened an issue in Julia Base to propose to port some of the functionality in Base to go hand in hand with the already existing allocating mapslices.

Topic		Replies	Views
Bikeshedding mapslices Internals & Design mapslices	18	2705	April 14, 2018
General reduction performance Performance	4	1092	October 3, 2018
In-place mapslices? General Usage views , inplace	5	218	March 1, 2025
View and Slices: comparison of speed Performance question	4	1118	January 27, 2018
Is there a way to "enumerate" `mapslices` other than```i = 0result = mapslices(m General Usage arrays , splitapplycombine , mapslices	3	573	February 17, 2021

Alternative to mapslices that does not allocate slices

Related topics