Convex Neural Network Using Skip Layers in Flux.jl

Hi,
I’m trying to code up the following net, where y is the input of my net, f(y) is the output of the net, W^z_k, W^y_k are matrices and b_k are vectors :

  • f(y) = z_k
  • z_k = softplus(W^z_k z_{k-1}+W^y_k y+b_k)
  • z_{1} = softplus(W^y_{0} y+b_{0})
    As you can see it’s basically a fully connected net, where each layer has access to the input. Furthermore, W^z_k need to be positive (elementwise) and I was thinking of enforcing this applying say an exponential basically defining W^z_k = \exp(W_k), where now W_k is an unconstrained matrix.
    Somehow though I cannot find a way so that each layer in a chain keeps access to the input of the chain. Any idea on how to code this chain in Flux? I’m sure I can define a single layer that does the whole thing but it won’t be very practical.

If you want to use built-in layers for this see SkipConnection and Parallel: Model Reference · Flux

I’ve seen these, but how would you use them to create the architecture I mentioned? It’s not clear to me how to do it since after one skip layer the original input is lost

I think this does what you write:

struct Adder{T<:Tuple}; layers::T; end;
Adder(layers...) = Adder(layers)
Flux.@functor Adder

function (a::Adder)(y)
  d1 = a.layers[1]
  z = d1(y)
  for d in a.layers[2:end]
    z = d(z + y)
  end
  z
end

m = Adder(Dense(2=>2, softplus), Dense(2=>2, softplus))
m(rand(2))
1 Like
demo_layer1(k) = y -> begin
  @info "just y" k y
  y
end

demo_layer2(k) = (z, y) -> begin
  @info "y & z" k y z
  k * (y + z)
end

# Easier to see the structure if we build up the model iteratively:
model = demo_layer1(1) # z1
model = SkipConnection(model, demo_layer2(2)) # feeding into z2
model = SkipConnection(model, demo_layer2(3)) # etc...

julia> model(1) # y = 1
┌ Info: just y
│   k = 1
└   y = 1
┌ Info: y & z
│   k = 2
│   y = 1
└   z = 1
┌ Info: y & z
│   k = 3
│   y = 1
└   z = 4
15

Note how the calculations for z_{2+} are being done in the skip connection’s “connection” and not as the (non-identity) branch, while z_1 is being done as part of that branch. There are also ways of formulating the forward pass of your layer so you only need one for both equations, but that’s an aside.

2 Likes

Thanks, I’ll give a try to both solutions :slight_smile: