(Flux) certain arbitrary model sizes and random seeds make gradients exactly zero

It looks like that last relu will make all gradients zero if the last Dense outputs a negative value which should be about 50% chance. Try skipping it or set it to identity.

1 Like