Ignore masked tokens (Flux.jl)

spragud2 · February 1, 2022, 5:50pm

Hi,

I’m working on implementing a seq2seq model in Flux. Is there a standard method, or if not a suitable approach, for ignoring masked tokens when computing the gradient? E.g. sentences of differing lengths, following the EOS token

Cheers

ToucheSir · February 2, 2022, 4:27pm

I’m not aware of any existing functionality. If you have some insight into how other frameworks handle this, we could workshop something.

spragud2 · February 24, 2022, 7:45pm

It appears that PyTorch has an ignore_index argument for its loss functions, which allows you to specify indices to ignore when computing the loss and will therefore affect gradient computations in a way I desire.

I guess it would be relatively straightforward to write my own loss function in Julia/Flux that does the same.

jling · February 24, 2022, 10:35pm

yeah, one of the “problems” of Julia is that it’s often too straightforward to write your own little something once you’re semi domain expert.

This means absolute beginners find Julia ecosystem harder to use due to less canned (although trivial) functionalities.

spragud2 · February 24, 2022, 11:34pm

Honestly this is so true its painful. TF/pytorch are functionally languages unto themselves, and I often feel frozen out of writing custom algo’s because I simply don’t know TF/pytorch well enough. Always left asking, am I writing this correctly?

With Julia/Flux, the code is the math, and everything just works. Love it!

ToucheSir · March 4, 2022, 4:51pm

If you do come up with something, feel free to open a PR and we’ll try to find a good home for it

Topic		Replies	Views
CharNN in Flux zoo New to Julia	1	365	April 11, 2020
How can I differentiate a subset of the outputs of a neural network in Flux or Lux? General Usage question , flux , ml , lux	10	473	August 31, 2023
Different behaviour between Flux.jl and Pytorch Machine Learning machine-learning	17	2273	February 13, 2021
Functionality of torch.nograd General Usage	1	223	January 2, 2023
Which function in julia.Flux is similar to "detach()" in pytorch? Machine Learning	2	87	August 29, 2024

Ignore masked tokens (Flux.jl)

Related topics