Flux: combine two neural networks to run simultaneously?

greatpet · October 1, 2024, 9:22am

For example, I have an input which is a length-10 vector. I want the first neural network to output a length-3 vector based on the first 5 elements of the input, and the second neural network to output a length-4 vector based on the last 5 element of the input. The overall output should be a length-7 vector which combines the outputs of the two neural networks. What’s the easiest way to do this? Though this sounds strange, the external package I use only accepts a “single” neural network in its API, so I cannot just use the two networks separately.

mcabbott · October 1, 2024, 1:27pm

One way is to use Parallel, applying vcat after the two paths, and some function to split the input before:

julia> mysplit(x::AbstractVector) = (x[1:5], x[6:end]);

julia> model1 = Dense(5 => 3);

julia> model2 = Dense(5 => 4, x->x+100);

julia> model = Chain(mysplit, Parallel(vcat, model1, model2))
Chain(
  mysplit,
  Parallel(
    vcat,
    Dense(5 => 3),                      # 18 parameters
    Dense(5 => 4, #5),                  # 24 parameters
  ),
)                   # Total: 4 arrays, 42 parameters, 376 bytes.

julia> model(ones32(10))
7-element Vector{Float32}:
  -1.6361262
   1.4683602
  -0.70261115
 100.29568
 100.38945
 100.91372
 100.26898

julia> mysplit(x::AbstractMatrix) = (x[1:5, :], x[6:end, :]);

julia> model(ones32(10, 3))  # now accepts a batch of inputs
7×3 Matrix{Float32}:
  -1.63613    -1.63613    -1.63613
   1.46836     1.46836     1.46836
  -0.702611   -0.702611   -0.702611
 100.296     100.296     100.296
 100.389     100.389     100.389
 100.914     100.914     100.914
 100.269     100.269     100.269

greatpet · October 1, 2024, 2:58pm

The only concern would be the amount of memory allocation involved. Can the input vector be split into two allocation-free views to be fed into the two NNs? Can I pre-allocate a single output vector and ask the two NNs to each write to a specific part of the output?

mcabbott · October 1, 2024, 3:21pm

Such optimisations are not impossible, but usually any nontrivial model1 & model2 will allocate far more than these simple steps anyway.

sylvaticus · October 2, 2024, 12:21pm

Not in Flux, but in the NN module of BetaML you can use ReplicatorLayer and GroupedLayer to obtain a multi branch neural network:

Perhaps you can use the same approach in Flux…