Multiplication and increment between elements whose index is stored in vectors without allocation: possible?

Hello, I have 3 arrays (x,y,w) where I want elementwise y += x * w but the specific element indexes are stored in 3 separate vectors of SArrays, x_ids, y_ids and w_ids.
The problem is that the operation hugely allocates, and I wander if there are better ways without allocation.

Here is a MWE… I also added a version where the indexes are SVector of SVector, but (1) the computation is even slower, (2) I don’t know the length of the indexes a priori (but when the positional indexed are computed they don’t change anymore), (3) creating the object with the index is very slow and I believe I risk out of memory errors with slightly larger index vectors .

using StaticArrays, BenchmarkTools

mutable struct ConvLayer
    x_ids::Array{SVector{3,Int32}}
    y_ids::Array{SVector{3,Int32}}
    w_ids::Array{SVector{4,Int32}} 
end
mutable struct ConvLayerSV # an all-static alternative
    x_ids::SVector{3000, SVector{3, Int32}}
    y_ids::SVector{3000, SVector{3, Int32}}
    w_ids::SVector{3000, SVector{4, Int32}}
end

# Data generation for the MWE...
x = rand(64,64,3)
y = zeros(32,32,5)
w = rand(4,4,3,5)
N = 3000
x_ids = [SVector{3,Int32}([rand(1:id) for id in size(x)]...) for n in 1:N]
y_ids = [SVector{3,Int32}([rand(1:id) for id in size(y)]...) for n in 1:N]  
w_ids = [SVector{4,Int32}([rand(1:id) for id in size(w)]...) for n in 1:N]  
layer = ConvLayer(x_ids, y_ids, w_ids)
layerSV = ConvLayerSV(x_ids, y_ids, w_ids) # long time here!

function compute!(y,l,x)
    for i in 1:length(l.y_ids)
        y[l.y_ids[i][1],l.y_ids[i][2],l.y_ids[i][3]] += 
           x[l.x_ids[i][1],l.x_ids[i][2],l.x_ids[i][3]] * 
           w[l.w_ids[i][1],l.w_ids[i][2],l.w_ids[i][3],l.w_ids[i][4]]
    end
    return nothing
end

# The computation that I care...
@btime compute!($y,$layer,$x)   #  7.542 ms (117890 allocations: 2.71 MiB)
@btime compute!($y,$layerSV,$x) # 42.124 ms ( 63000 allocations: 1.42 MiB)

Obviously, the context is of a convolution, and I am aware of packages like FourierTools.jl or Stencils.jl, but I was wandering if one could optimize this approach without allocation.
The idea is to store information about boundaries, padding, stride, etc. in a “precompile” step where the indexes are first created and then “just” loop over these indexes…

EDIT:
Maybe I have found it… by passing the vectors directly instead of the struct:

function compute!(y,x,w,y_ids,x_ids,w_ids)
    for i in 1:length(y_ids)
        y[y_ids[i][1],y_ids[i][2],y_ids[i][3]] += 
           x[x_ids[i][1],x_ids[i][2],x_ids[i][3]] * 
           w[w_ids[i][1],w_ids[i][2],w_ids[i][3],w_ids[i][4]]
    end
    return nothing
end

@btime compute!($y,$x,$w,$y_ids,$x_ids,$w_ids) #   24.684 μs (0 allocations: 0 bytes)

But why is that ? Because ConvLayer is mutable and hence the compiler doesn’t know it can safely skip checking its state on each operation ?

I think your ConvLayer contains a type instability because you don’t specify the order of the array. What happens if you replace Array with Vector in the struct definition?

2 Likes

Yup, right… thanks…

However there is still a huge difference with the version where I pass the array directly:

mutable struct ConvLayer
    x_ids::Vector{SVector{3,Int32}}
    y_ids::Vector{SVector{3,Int32}}
    w_ids::Vector{SVector{4,Int32}} 
end
@btime compute!($y,$layer,$x) #  1.323 ms (21000 allocations: 468.75 KiB)

So, using Vector instead of Array in the struct I move from 7.5 to 1.3 ms, but still far from 24 μs of the version passing the arrays…

I think the rest of the performance hit is because w is a global variable

4 Likes

Why is it mutable? You can mutate the fields even if the parent object is immutable.

2 Likes