Hipshot as I’m on the phone: Try removing that transpose of attn.v and initialize it as rand(1, attn_dim).
1 Like
Hipshot as I’m on the phone: Try removing that transpose of attn.v and initialize it as rand(1, attn_dim).