Hi,
I’m working on implementing a seq2seq model in Flux. Is there a standard method, or if not a suitable approach, for ignoring masked tokens when computing the gradient? E.g. sentences of differing lengths, following the EOS token
Cheers
Hi,
I’m working on implementing a seq2seq model in Flux. Is there a standard method, or if not a suitable approach, for ignoring masked tokens when computing the gradient? E.g. sentences of differing lengths, following the EOS token
Cheers
I’m not aware of any existing functionality. If you have some insight into how other frameworks handle this, we could workshop something.
It appears that PyTorch has an ignore_index argument for its loss functions, which allows you to specify indices to ignore when computing the loss and will therefore affect gradient computations in a way I desire.
I guess it would be relatively straightforward to write my own loss function in Julia/Flux that does the same.
yeah, one of the “problems” of Julia is that it’s often too straightforward to write your own little something once you’re semi domain expert.
This means absolute beginners find Julia ecosystem harder to use due to less canned (although trivial) functionalities.
Honestly this is so true its painful. TF/pytorch are functionally languages unto themselves, and I often feel frozen out of writing custom algo’s because I simply don’t know TF/pytorch well enough. Always left asking, am I writing this correctly?
With Julia/Flux, the code is the math, and everything just works. Love it!
If you do come up with something, feel free to open a PR and we’ll try to find a good home for it