Flux unable to differentiate an embedding layer

I’ve made an embedding layer that should be able to embed a batched input. The forward pass seems to work great, I get back an array with the dimensions I expect.

The backward pass however, doesn’t work. I get an error saying: DimensionMismatch("tried to assign 10×32×16 array to 320×16 destination")

using Flux

struct Embedding
    table
end

Embedding(voc_size, feature_size) = Embedding(param(Flux.glorot_normal(voc_size, feature_size)))

(e::Embedding)(x) = e.table[x, :]

function (e::Embedding)(x::AbstractArray{T, 2}) where {T}
    out = e.table[x, :]
    return(permutedims(out, (1, 3, 2)))
end

@Flux.treelike Embedding

model = Embedding(100, 16) # feature-size=16
input = rand(1:100, (10,32)) # input-size=10, batch-size=32

model(input) # input-size x feature-size x batch-size

loss(x) = sum(model(x)) # not actually meaningful, just as a test
loss(input)

Tracker.gradient(params(model)) do
    loss(input)
end  # DimensionMismatch("tried to assign 10×32×16 array to 320×16 destination")

Am I using unsupported functionality? I’m thinking the problem lies at the use of permutedims, but how can I solve this?
I’ve also tried to do this using Zygote (swapping Tracker.gradient with Zygote.gradient but then I get the same error.)

Thanks for any help!
Jules

Could you provide a self-contained example? I’m not able to run it, since glorot_normal, @treelike and test are not defined.

Whoops, sorry, now it should run I believe.

@mcabbott created a pr that should fix this: https://github.com/FluxML/Zygote.jl/pull/256
Thanks!