I’m trying Flux for the first time, so I’m porting a project I did in PyTorch earlier this year over to Flux. Essentially, the project is to classify plant leaves as healthy or diseased. I already have a baseline transfer learning model working in Flux, but I’m stuck in my Siamese/twin network model.
Here is what I have so far:
"Custom Flux NN layer which will create twin network from `path` with shared parameters and combine their output with `combine`."
struct Twin{T,F}
combine::F
path::T
end
# define the forward pass of the Twin layer
# feeds both inputs, X, through the same path (i.e., shared parameters)
# and combines their outputs
Flux.@functor Twin
(m::Twin)(Xs::Tuple) = m.combine(map(X -> m.path(X), Xs)...)
# this is the architecture that forms the path of the twin network
CNN_path = Chain(
# layer 1
Conv((5,5), 3 => 18, relu),
MaxPool((3,3), stride=3),
# layer 2
Conv((5,5), 18 => 36, relu),
MaxPool((2,2), stride=2),
# layer 3
Conv((3,3), 36 => 72, relu),
MaxPool((2,2), stride=2),
Flux.flatten,
# layer 4
Dense(19 * 19 * 72 => 64, relu),
Dropout(0.1),
# output layer
Dense(64 => 32, relu)
)
# this layer combines the outputs of the twin CNNs
bilinear = Flux.Bilinear((32,32) => 1)
twin_model = Twin(bilinear, CNN_path)
Essentially, I created a custom Flux layer (similar to Parallel
, but using only one path because I want parameter sharing), which takes in a tuple of arrays as input. When I run twin_model(...)
on a batch of images from my DataLoader, it works correctly and gives me the output I expect.
But when I run it in my training loop and try to get the gradient, it gives an error. Here is the relevant snippet from my training loop:
loss(Xs, y) = logitbinarycrossentropy(model(Xs), y)
@info "Beginning training loop..."
for epoch_idx ∈ 1:n_epochs
@info "Training epoch $(epoch_idx)..."
# train 1 epoch, record performance
@withprogress for (batch_idx, ((imgs₁, labels₁), (imgs₂, labels₂))) ∈ enumerate(zip(train_loader₁, train_loader₂))
X₁ = @pipe imgs₁ |> gpu |> float32.(_)
y₁ = @pipe labels₁ |> gpu |> float32.(_)
X₂ = @pipe imgs₂ |> gpu |> float32.(_)
y₂ = @pipe labels₂ |> gpu |> float32.(_)
Xs = (X₁, X₂)
y = ((y₁ == y₂) .* 1.0) # y represents if both images have the same label
gradients = gradient(() -> loss(Xs, y), params)
Flux.Optimise.update!(optimizer, params, gradients)
@logprogress batch_idx / length(enumerate(train_loader₁))
end
# other stuff...
end
And here is the error I get:
ERROR: InvalidIRError: compiling kernel rand!(CuDeviceMatrix{Float32, 1}, UInt32, UInt32) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to CUDA.Philox2x32{R}() where R in CUDA at C:\Users\Wombat\.julia\packages\CUDA\tTK8Y\src\device\random.jl:46)
The stack trace says it is occurring on this line gradients = gradient(() -> loss(Xs, y), params)
, but further inside the loss(Xs, y) = logitbinarycrossentropy(model(Xs), y)
. Because calling twin_model(Xs)
and loss(Xs, y)
both work, I’ve narrowed it down to this error only occuring when taking the gradients.
Because it’s a CUDA error, it should probably work on CPU, but my dataset is way too big, so CPU is not an option.
I also tested with a regular Parallel
layer instead of my custom Twin
layer, and it still gives the same issue.
Does anyone have any experience working with Siamese/twin networks in Flux, or multiple inputs, and know how to resolve this issue?