Siamese/Twin network in Flux (working with two image inputs)

I’m trying Flux for the first time, so I’m porting a project I did in PyTorch earlier this year over to Flux. Essentially, the project is to classify plant leaves as healthy or diseased. I already have a baseline transfer learning model working in Flux, but I’m stuck in my Siamese/twin network model.

Here is what I have so far:

"Custom Flux NN layer which will create twin network from `path` with shared parameters and combine their output with `combine`."
struct Twin{T,F}

# define the forward pass of the Twin layer
# feeds both inputs, X, through the same path (i.e., shared parameters)
# and combines their outputs
Flux.@functor Twin
(m::Twin)(Xs::Tuple) = m.combine(map(X -> m.path(X), Xs)...)

# this is the architecture that forms the path of the twin network
CNN_path = Chain(
    # layer 1
    Conv((5,5), 3 => 18, relu),
    MaxPool((3,3), stride=3),
    # layer 2
    Conv((5,5), 18 => 36, relu),
    MaxPool((2,2), stride=2),
    # layer 3
    Conv((3,3), 36 => 72, relu),
    MaxPool((2,2), stride=2),
    # layer 4
    Dense(19 * 19 * 72 => 64, relu),
    # output layer
    Dense(64 => 32, relu)

# this layer combines the outputs of the twin CNNs
bilinear = Flux.Bilinear((32,32) => 1)

twin_model = Twin(bilinear, CNN_path)

Essentially, I created a custom Flux layer (similar to Parallel, but using only one path because I want parameter sharing), which takes in a tuple of arrays as input. When I run twin_model(...) on a batch of images from my DataLoader, it works correctly and gives me the output I expect.

But when I run it in my training loop and try to get the gradient, it gives an error. Here is the relevant snippet from my training loop:

loss(Xs, y) = logitbinarycrossentropy(model(Xs), y)

@info "Beginning training loop..."
for epoch_idx ∈ 1:n_epochs
    @info "Training epoch $(epoch_idx)..."
    # train 1 epoch, record performance
    @withprogress for (batch_idx, ((imgs₁, labels₁), (imgs₂, labels₂))) ∈ enumerate(zip(train_loader₁, train_loader₂))
        X₁ = @pipe imgs₁ |> gpu |> float32.(_)
        y₁ = @pipe labels₁ |> gpu |> float32.(_)

        X₂ = @pipe imgs₂ |> gpu |> float32.(_)
        y₂ = @pipe labels₂ |> gpu |> float32.(_)

        Xs = (X₁, X₂)
        y = ((y₁ == y₂) .* 1.0) # y represents if both images have the same label

        gradients = gradient(() -> loss(Xs, y), params)
        Flux.Optimise.update!(optimizer, params, gradients)

        @logprogress batch_idx / length(enumerate(train_loader₁))
    # other stuff...

And here is the error I get:

ERROR: InvalidIRError: compiling kernel rand!(CuDeviceMatrix{Float32, 1}, UInt32, UInt32) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to CUDA.Philox2x32{R}() where R in CUDA at C:\Users\Wombat\.julia\packages\CUDA\tTK8Y\src\device\random.jl:46)

The stack trace says it is occurring on this line gradients = gradient(() -> loss(Xs, y), params), but further inside the loss(Xs, y) = logitbinarycrossentropy(model(Xs), y). Because calling twin_model(Xs) and loss(Xs, y) both work, I’ve narrowed it down to this error only occuring when taking the gradients.

Because it’s a CUDA error, it should probably work on CPU, but my dataset is way too big, so CPU is not an option.

I also tested with a regular Parallel layer instead of my custom Twin layer, and it still gives the same issue.

Does anyone have any experience working with Siamese/twin networks in Flux, or multiple inputs, and know how to resolve this issue?

1 Like

Can you try without Dropout layers? Seems an instance of `Dropout` layer not working with CUDA · Issue #2019 · FluxML/Flux.jl · GitHub

1 Like

Thanks a ton! It seems to be working now.