Kernel methods inside of Flux

I’m experimenting with setting up a classical RBF kernel method wtihin Flux, i.e., I want to train the model:

\sum_{i=1}^n a_i k(x, X_i),

where, for simplicity, take the kernel k to be a Gaussian. n is fixed, and the X_i are fixed; this is really just a regression problem. But thinking of it in a network architecture, what we want to have happen is:

x\mapsto (k(x, X_i))\mapsto (\sum_{i=1}^n a_i k(x, X_i))

which nis to say we go from d=>n=>1, where d is the dimension of the input argument.

What I am struggling with is getting the kernel layer to properly vectorize its output (i.e., if i input a d x 10 array, I should get a n x 10 output array), and if i push it all the way through i should end up with a 1 x 10 output array. This is what I’ve been trying:

Random.seed!(100)
n_pts = 100;
X_pts = [randn(Float32, 2,1) for _ in 1:n_pts];
X_train = randn(Float32, 2, 1000)
h = 1.0f0;
kernel(x, y) = exp(-sum(abs2, x - y) / (2 * h));

function k_layer(x)
    return [kernel(x, X_) for X_ in X_pts]
end

f_kernel = Chain(k_layer, Dense(n_pts => 1, bias=false));

and

f_kernel(rand(Float32,2))

works fine, but not

f_kernel(rand(Float32,2,10))

returns

DimensionMismatch: a has size (2, 10), b has size (2, 1), mismatch at dim 2

EDIT: The following works, but I have no sense of how clever/efficient an implementation it is:

f_kernel = Chain(x -> MapCols{2}(kernel_layers, x), final_layer)

Often the efficient way is not to make slices, and write everything to deal with solid arrays, keeping batch etc. dimensions the whole way through:

n_pts = 3;
X_pts = [randn(Float32, 2) for p in 1:n_pts];
X_fixed = hcat(X_pts...)  # 2×3 Matrix, call it X[v,p] vector and point

X_train = randn(Float32, 2, 7)  # x[v,b]: vector and batch indices
X_1 = X_train[:,1]

h = 1.0f0;  # non-const global, not ideal
kernel(x, y) = exp(-sum(abs2, x - y) / (2 * h));  # z = exp( sum_v (x[v] - y[v])^2 / (-2h) )
kernel(X_pts[1], X_1)  # Vector, Vector -> Float32

function k_layer(x)    # z[p] = exp( sum_v (x[v] - X[v,p])^2 / (-2h) )
    return [kernel(x, X_) for X_ in X_pts]
end

k_layer(X_1)  # 3-element Vector, z[p]
# k_layer(X_train)  # error, but you want this:
# z[p,b] = exp( sum_v (x[v,b] - X[v,p])^2 / (-2h) )
#        = exp(-1/2h sum_v (x[v,b])^2 - 2 x[v,b] * X[v,p] + (X[v,p])^2 )

using TensorCast
function k_layer2(x, X=X_fixed)
    @reduce tmp[p,b] := sum(v) (x[v,b] - X[v,p])^2 / (-2h)
    exp.(tmp)
end

k_layer2(X_train)  # 3×7 Matrix, first col matches k_layer(X_1)

function k_layer3(x, X=X_fixed)
    term1 = diag(x'*x)'  # t[b] = sum_v (x[v,b])^2
    term2 = X'*x         # t[p,b] = sum_v x[v,b] * X[v,p]
    term3 = diag(X'*X)   # t[p] = sum_v (X[v,p])^2
    @. exp((term1 -2*term2 + term3)/(-2h))
end

k_layer3(X_train)  # 3×7 Matrix, first col matches k_layer(X_1)

using Flux
f_kernel = Chain(k_layer3, Dense(n_pts => 1, bias=false));

f_kernel(X_train)  # 1×7 Matrix