Column-wise operations on matrices, allocating array views

foobar_lv2 · November 27, 2017, 1:15pm

Hi,
I find myself needing to do column-wise operations on matrices a lot (i.e. on A[:,j]). Or, spoken differently, I would want to use my matrix almost like a Vector{SVector}, i.e. A[j][:], which has identical data layout.

Ok, what do I need to do? Well, all the things one would like to do with vectors: arithmetic, swap, pass to functions.

Broadcast and array views have nice syntax but don’t work (they allocate). Vector{SVector} has nice syntax (I need to specialize on the number of rows though, and remember the changed order of dimensions), but crucially does not give by-reference-passing to functions.

So, I wanted to ask what other people are using. Of course I could write helper functions that try to behave like the broadcast/arrayview but keep the array-ref and the index separate; then I get no allocations, at the price of code-duplication (on functions that need to be capable of accepting both arrays and array_views).

I could also go for writing explicit loops in all instances, but this kinda defies the point of using a high-level language (you wouldn’t even do this in C).

I also could go for a horrible “C-style” view (pointer), which does not protect the underlying matrix from gc. Sprinkling gc_preserve might make this marginally better.

So: How are you people dealing with this problem? Any kind of julia-array that already solves this?

PS.
Aggressive elimination of allocations probably won’t solve this. It will be solved once immutables containing reference-fields become bitstype, for the sake of code_native (allocation/arg-passing/array-storage).

See also https://github.com/JuliaStats/Distances.jl/issues/83.

stevengj · November 27, 2017, 2:10pm

Use Vector{Vector} or Vector{MVector}?

rdeits · November 27, 2017, 2:54pm

I also could go for a horrible “C-style” view (pointer), which does not protect the underlying matrix from gc

Not recommended, but certainly possible. I used that approach in NNLS.jl where it was helpful in porting some existing Fortran code.

But in general I would agree with @stevengj: is there a compelling reason to have a Matrix{T} rather than a Vector{Vector{T}}?

rdeits · November 27, 2017, 2:56pm

Also, what exactly do you mean by this? Everything in Julia has pass-by-reference (or, I suppose, pass-by-pointer) semantics, including SVectors. They’re just immutable, so they might be copied if the compiler thinks that’s helpful, but whether a copy is made is irrelevant to the semantics.

foobar_lv2 · November 27, 2017, 3:11pm

Cache, indirection, storage overhead. Matrix{T} is equivalent to Vector{SVector}. If the inner vector is very large, then then Vector{Vector{T}} or array views are both fine (because I pay a constant price, amortized over a very large column). If the inner vector is very small, then SVector is fine. But it would be nice to use the same code for both.

Cases where you want to store a million 10-dim datapoints are terrible for Vector{Vector}.

really?

As in: I have A=Vector{SVector}, and want to evaluate f(A[j]). Now the compiler either needs to make a copy or needs to: (1) defend against me changing A[j] (by overwriting with a new SVector) and (2) needs to keep alive A (because A[j] is allocated somewhere inside of A’s arraydata).

If the f is inline, then llvm should be able to avoid making this copy; else this should not be nice.

tim.holy · November 27, 2017, 8:47pm

It’s a little better on 0.7, because the compiler is smarter about eliding things and inlining “cheap” functions. For example,

julia> function sumcols!(dest, A::AbstractMatrix)
           _, indc = indices(A)
           @assert indices(dest) == (indc,)
           for i in indc
               @inbounds dest[i] = mysum(view(A, :, i))
           end
           dest
       end
sumcols! (generic function with 1 method)

julia> function mysum(v)
           s = 0.0
           @inbounds for x in v
               s += x
           end
           s
       end
mysum (generic function with 1 method)

julia> A = rand(1000,1000);

julia> dest = Vector{Float64}(1000);

# After warmup
julia> @time sumcols!(dest, A);
  0.001325 seconds (4 allocations: 160 bytes)

on 0.7 (but on 0.6 it has 1k allocations). However, if you replace mysum with sum then it allocates, because sum can’t inline.

rdeits · November 27, 2017, 11:09pm

because sum can’t inline

Could you explain why that is? I dug around in the implementation of sum through mapreduce, and I don’t see any intentional @noinline, so I assume it’s something more subtle that prevents sum() from inlining?

tim.holy · November 27, 2017, 11:26pm

Most likely just size. We don’t inline functions if we estimate their runtime cost to be significantly higher than that of a function call, and sum is more complex than mysum so probably gets penalized more heavily. More detail here.

rdeits · November 27, 2017, 11:41pm

I see. Thank you!

Topic		Replies	Views
Assigning an array to a view General Usage	12	1904	February 3, 2019
Passing views to function without allocation New to Julia	14	1534	December 18, 2020
Scalar multiplication makes array reallocation Performance array , memory-allocation , column-major	14	3530	October 15, 2020
Column iterator General Usage	8	794	September 17, 2018
Using `view` in 0.6 on a ragged array General Usage question	12	1071	July 22, 2017

Column-wise operations on matrices, allocating array views

Related topics