For example, I have an input which is a length-10 vector. I want the first neural network to output a length-3 vector based on the first 5 elements of the input, and the second neural network to output a length-4 vector based on the last 5 element of the input. The overall output should be a length-7 vector which combines the outputs of the two neural networks. What’s the easiest way to do this? Though this sounds strange, the external package I use only accepts a “single” neural network in its API, so I cannot just use the two networks separately.
One way is to use Parallel, applying vcat after the two paths, and some function to split the input before:
julia> mysplit(x::AbstractVector) = (x[1:5], x[6:end]);
julia> model1 = Dense(5 => 3);
julia> model2 = Dense(5 => 4, x->x+100);
julia> model = Chain(mysplit, Parallel(vcat, model1, model2))
Chain(
mysplit,
Parallel(
vcat,
Dense(5 => 3), # 18 parameters
Dense(5 => 4, #5), # 24 parameters
),
) # Total: 4 arrays, 42 parameters, 376 bytes.
julia> model(ones32(10))
7-element Vector{Float32}:
-1.6361262
1.4683602
-0.70261115
100.29568
100.38945
100.91372
100.26898
julia> mysplit(x::AbstractMatrix) = (x[1:5, :], x[6:end, :]);
julia> model(ones32(10, 3)) # now accepts a batch of inputs
7×3 Matrix{Float32}:
-1.63613 -1.63613 -1.63613
1.46836 1.46836 1.46836
-0.702611 -0.702611 -0.702611
100.296 100.296 100.296
100.389 100.389 100.389
100.914 100.914 100.914
100.269 100.269 100.269
The only concern would be the amount of memory allocation involved. Can the input vector be split into two allocation-free views to be fed into the two NNs? Can I pre-allocate a single output vector and ask the two NNs to each write to a specific part of the output?
Such optimisations are not impossible, but usually any nontrivial model1 & model2 will allocate far more than these simple steps anyway.
Not in Flux, but in the NN module of BetaML you can use ReplicatorLayer and GroupedLayer to obtain a multi branch neural network:
Perhaps you can use the same approach in Flux…
