Column-wise operations on matrices, allocating array views

Hi,
I find myself needing to do column-wise operations on matrices a lot (i.e. on A[:,j]). Or, spoken differently, I would want to use my matrix almost like a Vector{SVector}, i.e. A[j][:], which has identical data layout.

Ok, what do I need to do? Well, all the things one would like to do with vectors: arithmetic, swap, pass to functions.

Broadcast and array views have nice syntax but don’t work (they allocate). Vector{SVector} has nice syntax (I need to specialize on the number of rows though, and remember the changed order of dimensions), but crucially does not give by-reference-passing to functions.

So, I wanted to ask what other people are using. Of course I could write helper functions that try to behave like the broadcast/arrayview but keep the array-ref and the index separate; then I get no allocations, at the price of code-duplication (on functions that need to be capable of accepting both arrays and array_views).

I could also go for writing explicit loops in all instances, but this kinda defies the point of using a high-level language (you wouldn’t even do this in C).

I also could go for a horrible “C-style” view (pointer), which does not protect the underlying matrix from gc. Sprinkling gc_preserve might make this marginally better.

So: How are you people dealing with this problem? Any kind of julia-array that already solves this?

PS.
Aggressive elimination of allocations probably won’t solve this. It will be solved once immutables containing reference-fields become bitstype, for the sake of code_native (allocation/arg-passing/array-storage).

See also https://github.com/JuliaStats/Distances.jl/issues/83.

1 Like

Use Vector{Vector} or Vector{MVector}?

I also could go for a horrible “C-style” view (pointer), which does not protect the underlying matrix from gc

Not recommended, but certainly possible. I used that approach in NNLS.jl where it was helpful in porting some existing Fortran code.

But in general I would agree with @stevengj: is there a compelling reason to have a Matrix{T} rather than a Vector{Vector{T}}?

Also, what exactly do you mean by this? Everything in Julia has pass-by-reference (or, I suppose, pass-by-pointer) semantics, including SVectors. They’re just immutable, so they might be copied if the compiler thinks that’s helpful, but whether a copy is made is irrelevant to the semantics.

Cache, indirection, storage overhead. Matrix{T} is equivalent to Vector{SVector}. If the inner vector is very large, then then Vector{Vector{T}} or array views are both fine (because I pay a constant price, amortized over a very large column). If the inner vector is very small, then SVector is fine. But it would be nice to use the same code for both.

Cases where you want to store a million 10-dim datapoints are terrible for Vector{Vector}.

really?

As in: I have A=Vector{SVector}, and want to evaluate f(A[j]). Now the compiler either needs to make a copy or needs to: (1) defend against me changing A[j] (by overwriting with a new SVector) and (2) needs to keep alive A (because A[j] is allocated somewhere inside of A’s arraydata).

If the f is inline, then llvm should be able to avoid making this copy; else this should not be nice.

It’s a little better on 0.7, because the compiler is smarter about eliding things and inlining “cheap” functions. For example,

julia> function sumcols!(dest, A::AbstractMatrix)
           _, indc = indices(A)
           @assert indices(dest) == (indc,)
           for i in indc
               @inbounds dest[i] = mysum(view(A, :, i))
           end
           dest
       end
sumcols! (generic function with 1 method)

julia> function mysum(v)
           s = 0.0
           @inbounds for x in v
               s += x
           end
           s
       end
mysum (generic function with 1 method)

julia> A = rand(1000,1000);

julia> dest = Vector{Float64}(1000);

# After warmup
julia> @time sumcols!(dest, A);
  0.001325 seconds (4 allocations: 160 bytes)

on 0.7 (but on 0.6 it has 1k allocations). However, if you replace mysum with sum then it allocates, because sum can’t inline.

because sum can’t inline

Could you explain why that is? I dug around in the implementation of sum through mapreduce, and I don’t see any intentional @noinline, so I assume it’s something more subtle that prevents sum() from inlining?

Most likely just size. We don’t inline functions if we estimate their runtime cost to be significantly higher than that of a function call, and sum is more complex than mysum so probably gets penalized more heavily. More detail here.

I see. Thank you!