# Flux: 1D convolutions (on genomic data)

I am trying to recreate this tutorial from Python.

This basically deals with how to apply deep learning to genomic data. As such, the input consists of sequences, where each sequence is based on a combination of any of the four bases: A, C, G, T. An example of such sequence could be e.g. `CCGAGGGCTATGGTTTGGAAGTTAGAACCCTGGGGCTTCTCGCGGA`.

To make the sequences into something numerical, we first have to one-hot-encode them. For the one-hot encoding, I do:

``````sequence = "CCGAGGGCTATGGTTTGGAAGTTAGAACCCTGGGGCTTCTCGCGGA"
alphabet = ['A', 'C', 'G', 'T']
using Flux: onehot
tokenise(s) = [onehot(c, alphabet) for c in s]
tokenise(sequence)
``````

which gives a 46-element array of 4-dimensional vectors.

The tutorial then continues using a convolutional neural network (CNN) using a 1D convolutional layer. However, I was having trouble finding anything similar in Flux.

The Python code in question is:

``````model = Sequential()

metrics=['binary_accuracy'])
``````

I have tried looking at CNNs implemented in Flux running on e.g. MNIST data, but I couldnβt find anything related to the 1D-case.

Can anything help with a Flux implementation?

2 Likes

You can just use the Conv layer, it handles up to 3d conv

1 Like

So you do not need a Flux version of Kerasβ `Conv1D`? How come they have a separate version in Keras/Tensorflow then?

Also, do you have a MWE of this?

Okay, I think I managed to get it working! First I load in the data and one-hot-encode it (pretty similar to my first shot above):

``````using MLJ
using Flux
import MLJFlux
import MLJIteration
using CairoMakie
import Base.@kwdef

url_seqs = "https://raw.githubusercontent.com/abidlabs/deep-learning-genomics-primer/master/sequences.txt"

url_labels = "https://raw.githubusercontent.com/abidlabs/deep-learning-genomics-primer/master/labels.txt"

#%%

bases_dna = ['A', 'C', 'G', 'T']

function ohe_row(sequence)
return collect(sequence) .== permutedims(bases_dna)
end

function onehot(sequences)
L = 50
N_bases = 4
out = BitArray(undef, L, N_bases, size(sequences, 1))
for (i, sequence) in enumerate(sequences)
out[:, :, i] = reshape(ohe_row(sequence), (L, N_bases, 1))
end
return out
end

#%%

sequences = onehot(sequences_raw);

images = coerce(sequences, GrayImage);
labels = coerce(labels_raw, Binary);

N_images = length(images)
idxs = shuffle(1:N_images);
train = idxs[1:1500];
test = idxs[1501:end];
``````

One slightly weird feature is that it seems to be easiest to convert the genomic data to an `L` times 4 dimensional gray image (with `L` being the sequence length, 50 in this case).

The convolutional network was a bit tricky to get working, at least regarding using the correct dimensions for the convolutional filter. However, I think I got it by using the following:

``````@kwdef struct MyDNABuilder
kernel_size::Int
filters::Int
dense::Int
end

make2d(x::AbstractArray) = reshape(x, :, size(x)[end])

function MLJFlux.build(b::MyDNABuilder, rng, n_in, n_out, n_channels)
kernel_size, filters, dense = b.kernel_size, b.filters, b.dense
init = Flux.glorot_uniform(rng)
front = Chain(
Conv((kernel_size, 4), 1 => filters, pad = (0, 0), bias = true, init = init),
MaxPool((4, 1)),
make2d,
)
d = Flux.outputsize(front, (n_in..., n_channels, 1)) |> first
return Chain(front, Dense(d, dense, init = init), Dense(dense, n_out, init = init))
end

clf = MLJFlux.ImageClassifier(
builder = MyDNABuilder(kernel_size = 12, filters = 32, dense = 16),
loss = Flux.crossentropy,
epochs = 50,
batch_size = 32,
rng = 123,
finaliser = softmax,
)
``````

This network specification should be similar to the one used in Python (see my original post).

To capture the training losses as a function of epoch, I followed the MNIST example from MLJFLux:

``````# To store the traces:
losses = Float32[]
training_losses = Float32[]
parameter_means = Float32[];
epochs = Int64[]

# To update the traces:
update_loss(loss) = push!(losses, loss)
update_training_loss(losses) = push!(training_losses, losses[end])
update_epochs(epoch) = push!(epochs, epoch)

controls = [
Step(1),
NumberLimit(n = 50),
WithLossDo(),
WithLossDo(update_loss),
WithTrainingLossesDo(update_training_loss),
WithIterationsDo(update_epochs),
];

# The "self-iterating" classifier:
iterated_clf = IteratedModel(
model = clf,
controls = controls,
resampling = Holdout(fraction_train = 0.75),
measure = log_loss,
);

# Binding the wrapped model to data:
mach = machine(iterated_clf, images[train], labels[train]);

# Training
fit!(mach);
``````

This gives the following training loss curve:

I do not understand why the loss decreases, then increases around epoch 25-30 and then suddenly drops again. If anyone has any suggestions, please let me know! In comparison, the training loss in the Python example does not exhibit similar behaviour.

The accuracy on the test set is 96.8% using Julia, while it is 97.4% using Python so thatβs also an advantage of the Python version.

The confusion matrix given Julia is:

``````              βββββββββββββββββββββββββββββ
β       Ground Truth        β
βββββββββββββββΌββββββββββββββ¬ββββββββββββββ€
β  Predicted  β      0      β      1      β
βββββββββββββββΌββββββββββββββΌββββββββββββββ€
β      0      β     241     β      3      β
βββββββββββββββΌββββββββββββββΌββββββββββββββ€
β      1      β     13      β     243     β
βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ
``````

and in Python is:

``````[[249 10]
[ 3 238]]
``````

Just a quick side-note:
For one-hot encoding you can simply use `Flux.onehotbatch()`:

``````julia> Flux.onehotbatch(collect(sequence), alphabet)
4Γ46 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
β  β  β  1  β  β  β  β  β  1  β  β  β  β  β  β  β  β  1  β¦  β  β  β  β  β  β  β  β  β  β  β  β  β  β  β  β  β  β  1
1  1  β  β  β  β  β  1  β  β  β  β  β  β  β  β  β  β  β     1  1  1  β  β  β  β  β  1  β  β  1  β  1  β  1  β  β  β
β  β  1  β  1  1  1  β  β  β  β  1  1  β  β  β  1  1  β     β  β  β  β  1  1  1  1  β  β  β  β  β  β  1  β  1  1  β
β  β  β  β  β  β  β  β  1  β  1  β  β  1  1  1  β  β  β     β  β  β  1  β  β  β  β  β  1  1  β  1  β  β  β  β  β  β
``````
1 Like

No clue, I assume thereβs some historical baggage involved since e.g. Flax uses a similar unified API as Flux.

1 Like

The models are not identical. One thing that immediately jumps out is that PyTorch uses kaiming inits for weights and scaled uniform init for biases, whereas here youβre using glorot init. Flux exposes a similar set of init utils, so you should be able to replicate the PyTorch behaviour there.

Woops, thatβs what I get for replying on mobile and not reading the colab notebook. The initializations are indeed the same. Have you tried running each model a few times with different seeds to see if the discrepancy persists?

Thanks for that suggestion! I chose this other method due to speed and the number of allocations. With the following definition, `sequence = sequences_raw[1]`, I find the following:

``````@btime Flux.onehotbatch(collect(sequence), bases_dna);
# output: 64.762 ΞΌs (411 allocations: 19.00 KiB)
``````
``````@btime ohe_row(sequence);
# output: 662.865 ns (5 allocations: 496 bytes)
``````

so a factor of 100 in speed, most importantly, and also almost a factor of 100 in the number of allocations.

1 Like

I have tried to be as precise in the Keras => Flux translation as possible, including using the same initialisations, as you also mention. I should probably try rerunning the models as you say to see the size of the stochastic variability, good idea!

Could you open an issue in Flux about this? We should improve `onehotbatch` performance, especially if thatβs so easy

1 Like

Thereβs another factor of 50 in speed in the onehot encoding if you do:

``````const bases_dna = ['A', 'C', 'G', 'T'];

function ohe_row!(out, sequence)
for (i, c) in enumerate(sequence)
out[i, :] .= bases_dna .== c
end
nothing
end

function onehot2(sequences)
L = 50
N_bases = 4
out = BitArray(undef, L, N_bases, size(sequences, 1))
for i in eachindex(sequences)
ohe_row!(view(out, :, :, i), view(sequences, i))
end
out
end
``````
``````julia> @btime onehot2(sequences_raw);
37.463 ΞΌs (3 allocations: 48.97 KiB)
``````
``````julia> @btime onehot(sequences_raw);
1.594 ms (14003 allocations: 1.21 MiB)
``````
1 Like

Haha, no, thatβs not really my main concern, but itβs always good with some extra performance, so thanks!

Ok, so on your original question:

wouldnβt this here correspond to the first convolution in the python version?
I.e. treat the genome dimension (i.e. the dimension of length 4) as a channel dimension rather than a convolution dimension:

``````function build_first_conv(params::MyDNABuilder)
init = Flux.glorot_uniform()
Conv(
(params.kernel_size,),
4 => params.filters,
bias = true,
init = init
)
``````

That way I get the same number of parameters (1_568), and same output shape (modulo bach-first vs batch-last, (39, 32, 1500)).

An additional difference to the python version seems to be that youβre missing a `relu` activation after the first `Dense` layer.

For the full model I would then have:

``````function build_model(params::MyDNABuilder)
init = Flux.glorot_uniform()
front = Chain(
Conv(
(params.kernel_size,),
4 => params.filters,
bias = true,
init = init
),
MaxPool((4,)),
flatten,
)
d = Flux.outputsize(front, (50, 4, 1))[1]
Chain(
front,
Dense(
d,
params.dense,
relu,
init = init
),
Dense(
params.dense,
2,
init = init
)
)
end
``````

also note that you donβt need to convert to a gray image. Just plug in the date as it comes out of the `onehot` function (convert to `Float32` though, for fast gradients)

``````julia> model = build_model(MyDNABuilder(12, 32, 16))
Chain(
Chain(
Conv((12,), 4 => 32),               # 1_568 parameters
MaxPool((4,)),
Flux.flatten,
),
Dense(288, 16, relu),                 # 4_624 parameters
Dense(16, 2),                         # 34 parameters
)                   # Total: 6 arrays, 6_226 parameters, 25.055 KiB.
``````
``````julia> size(x)
(50, 4, 1500)
``````
``````julia> typeof(x)
BitArray{3}
``````
``````julia> size(model(x))
(2, 1500)
``````
2 Likes

I have taken a look at your solution and there are a couple of things that I am in doubt about.
First of all, the your model signature is:

``````Chain(
Chain(
Conv((12,), 4 => 32),               # 1_568 parameters
MaxPool((4,)),
Flux.flatten,
),
Dense(288, 16, relu),                 # 4_624 parameters
Dense(16, 2),                         # 34 parameters
)                   # Total: 6 arrays, 6_226 parameters, 25.055 KiB.
``````

compared to mine:

``````Chain(
Chain(
Conv((12, 4), 1 => 32),             # 1_568 parameters
MaxPool((4, 1)),
Flux.flatten,
),
Dense(288, 16, relu),                 # 4_624 parameters
Dense(16, 2),                         # 34 parameters
)                   # Total: 6 arrays, 6_226 parameters, 25.266 KiB.
``````

They both seem similar when comparing the number of parameters so arenβt these two models actually the same? Or maybe I am missing something.

Btw, you were completely right about me missing a `relu` in the semi last `Dense` layer, so thanks a lot for spotting that!

I couldnβt really get your function to play together with MLJFlux.
In particular, MLJFlux requires defining a new `MLJFlux.Builder` method with one of these signatures:

``````MLJFlux.build(builder::MyBuilder, rng, n_in, n_out)
MLJFlux.build(builder::MyBuilder, rng, n_in, n_out, n_channels) # for use with `ImageClassifier`
``````

so the use of channels seems to imply the use of images?

Regarding channels, it also says that

`x <: Array{<:Float32, 4}` of size `(W, H, n_channels, batch_size)`, where `(W, H) = n_in`, `n_channels` is 1 or 3, and `batch_size` is any integer (for use with `ImageClassifier`)

so it sounds like you cannot even use 4 channels.

Youβre right, since you set padding to zero, so except for the extra dimension, both ways should be the same. I just quickly tried though, and the two different convolutions give different results (aside from the extra dimension in your version, and with the same weights of course). Not sure why.

I donβt have much experience with MLJFlux. But from what you say, `ImageClassifier` seems to tie you to actual grayscale or RGB images (where the color dimension = channels).
Wouldnβt `NeuralNetworkClassifier` be a bit more flexible in that regard?

Anyway I think a non-padded Kx4 convolution over a Lx4 image with a single channel should be equivalent to a K convolution over a length L sequence with four channels (even though it seems the output of the two operations are not equal). So not sure whether its worth the effort to figure out how to use the latter with MLJFlux. Maybe someone else can say more on why the operations are different.

How look things with that additional Relu?

1 Like

As I read the documentation, `NeuralNetworkClassifier` deals with

`<:Table(Continuous)` with `n_in` columns

which I understand as just a 2D table?

To keep using MLJFlux as the framework, I am thinking that, since MLJFlux uses the following dimensions: W x H x C x N and treats my sequences / βimagesβ as 50 x 4 x 1 x N, I should be able to use `permutedims(image, (1, 3, 2, 4))` to permute them to dimension 50 x 1 x 4 x N and then use your suggestion about the convolution with:

`Conv((12, 1), 4 => 32, pad = 0, bias = true, )`

This would treat the 4 bases as channels instead of height, as far as I see.

When I added the relu to my previous model, I get the following loss curve:

Still with the same weird increase in loss. Iβll give my permutation approach a go now and see how that goes.

1 Like

Things are starting to look better!

I just tried the new version using your suggestion regarding the convolutions:

``````function permute_height_channels(A)
return permutedims(A, (1, 3, 2, 4))
end

@kwdef struct MyDNABuilder2
kernel_size::Int
filters::Int
dense::Int
end

function MLJFlux.build(b::MyDNABuilder2, rng, n_in, n_out, n_channels)
kernel_size, filters, dense = b.kernel_size, b.filters, b.dense
init = Flux.glorot_uniform(rng)
front = Chain(
permute_height_channels,
Conv((kernel_size, 1), 4 => filters, pad = 0, bias = true, init = init),
MaxPool((4, 1)),
flatten,
)
d = Flux.outputsize(front, (n_in..., n_channels, 1)) |> first
return Chain(
front,
Dense(d, dense, relu, init = init),
Dense(dense, n_out, init = init),
)
end
``````

The loss curve for this method is:

which already looks a lot better!

Also the confusion matrix is better than before:

``````              βββββββββββββββββββββββββββββ
β       Ground Truth        β
βββββββββββββββΌββββββββββββββ¬ββββββββββββββ€
β  Predicted  β      0      β      1      β
βββββββββββββββΌββββββββββββββΌββββββββββββββ€
β      0      β     245     β      2      β
βββββββββββββββΌββββββββββββββΌββββββββββββββ€
β      1      β      9      β     244     β
βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ
``````

and the accuracy is 97.8% which I would conclude on par with Python.

2 Likes

I tried implementing everything in pure Flux, no MLJFlux, however, without any luck.

The code be as similar to the previous code as possible.

``````using Flux
using Random
using Flux.Losses: logitcrossentropy
using Flux: onehotbatch
using Flux: onecold

#%%

const bases_dna = ['A', 'C', 'G', 'T']

function ohe_row(sequence)
return collect(sequence) .== permutedims(bases_dna)
end

function ohe_row!(out, sequence)
for (i, c) in enumerate(sequence)
out[i, :] .= bases_dna .== c
end
end

function onehot(sequences)
L = 50
N_bases = 4
out = BitArray(undef, L, N_bases, size(sequences, 1))
for i in eachindex(sequences)
ohe_row!(view(out, :, :, i), view(sequences, i))
end
return out
end

#%%

function get_data()

url_seqs = "https://raw.githubusercontent.com/abidlabs/deep-learning-genomics-primer/master/sequences.txt"

url_labels = "https://raw.githubusercontent.com/abidlabs/deep-learning-genomics-primer/master/labels.txt"

# One-hot-encode the sequences
sequences = Float32.(onehot(sequences_raw));

# One-hot-encode the labels
labels = onehotbatch(labels_raw, 0:1)

N_sequences = last(size(sequences))
idxs = shuffle(1:N_sequences);
idx_split = Int(N_sequences*0.75)
train = idxs[1:idx_split];
test = idxs[idx_split+1:end];

end

#%%

function build_model()
init = Flux.glorot_uniform()
front = Chain(
Conv(
(12, ),
4 => 32,
bias = true,
init = init
),
MaxPool((4, )),
Flux.flatten,
)
d = Flux.outputsize(front, (50, 4, 1))[1] # 288
Chain(
front,
Dense(
d,
16,
relu,
init = init
),
Dense(
16,
2,
init = init
),
)
end

acc = 0
ls = 0.0f0
num = 0
yΜ = model(x)
ls += logitcrossentropy(yΜ, y, agg=sum)
acc += sum(onecold(yΜ) .== onecold(y))
num +=  size(x)[end]
end
return ls / num, acc / num
end

loss(yΜ, y) = logitcrossentropy(yΜ, y)

function train()

# Construct model
model = build_model()
ps = Flux.params(model) # model's trainable parameters;

## Optimizer

train_losses = Float64[]
train_accuracies = Float64[]
test_losses = Float64[]
test_accuracies = Float64[]

## Training
epochs = 50
for epoch in 1:epochs

yΜ = model(x)
loss(yΜ, y)
end

Flux.Optimise.update!(opt, ps, gs) # update parameters
end

# Report on train and test
println("Epoch=\$epoch")
println("  train_loss = \$train_loss, train_accuracy = \$train_acc")
println("  test_loss = \$test_loss, test_accuracy = \$test_acc")

push!(train_losses, train_loss)
push!(train_accuracies, train_acc)

push!(test_losses, test_loss)
push!(test_accuracies, test_acc)

end

return train_losses, train_accuracies, test_losses, test_accuracies

end

train_losses, train_accuracies, test_losses, test_accuracies = train();
``````

The loss curve looks like:

And the accuracy curve looks like:

Weird, I independently just implemented it in Flux and saw the same behaviour.
I donβt have time to debug it right now, but if you figure it out I would be very curious to hear whatβs causing it