Flux - Has anyone ever gotten Custom Loss Functions to work with Zygote?

I am trying to train an MLP network with 1 output where the outputs are in various groups (as indicated by a stratification variable). I wish to normalise outputs within groups (as in softmax) before applying the loss function.

I had written a custom loss function to do this before Flux switched from Tracker to Zygote. My Network ran just fine previously, but now fails with an ‘Need an adjoint for constructor’ error

Can anyone help me figure out how to specify the loss function ?

In my code, I pass the stratification variable with the target output, so the composite output matrix y is 2 rows and n columns, so

Data matrix x has m rows and n columns
Target matrix y has 2 rows and n columns

m(x) is my model (from ‘Chain’, with single exponential output activation)

Here is my loss function

``
function loss(x,y)

p=renstrat(y[2,:],m(x)) # renstrat() is a fast external normalisation routine

crossent=sum(y[1,:].*log.(y[1,:]./p)

end
``

This is a very basic example, which worked fine with theTracker version of Flux but not for the Zygote one.

I even tried to manually replace the loss function with its
2-term Taylor series expansion (i.e., a simple quadratic function(!!)),
but got the same error

Or I could supply the derivatives manually if I knew how to specify?

Does anyone have any ideas?

Thanks for any help.

On another point - it seems that many people have had issues with writing Custom Loss functions for ML with the ‘Zygote’ version of Flux [has anyone ever heard of someone getting it to work?]
Custom Loss functions are essential for Research and even just for advancement of basic applications.
Is there any chance of bringing back Tracker, at least as an option (until Zygote becomes fully operational in this regard)?

It’s not easy to help you without knowing the error message…
That’s said, I just saw that you used a semicolon ; instead of a column : in you example in this line:

crossent=sum(y[1,;].*log.(y[1,:]./p)

Has anyone? Yes, DiffEqFlux.jl: High Level Scientific Machine Learning (SciML) Pre-Built Architectures · DiffEqFlux.jl is a whole set of tutorials with differential equations in it with some fairly complex examples, so not just simple things work but a ton of very non-obvious complex things do too.

So share some code: your problem is very specific to something you’re doing.

2 Likes

Thanks. The semicolon was not in my code (typo crept in as I was copying by hand). Initially I was getting error messages about attempting to mutate arrays, but I rewrote my ‘renstrat’ function to address this, and the error became
“ERROR: Need an adjoint for constructor …”
And then many lines fo reference to lines in Zygote, etc.
This error seems to come up alot in my searches as I look into this.

Perhaps if I were to supply some gradient information manually in a form it likes somehow?

Thanks for any thoughts.

Thanks, Chris, I meant in the Machine Learning (like MLP) context, but I will see if there is something in the tutorials which might help (thanks again).

In terms of sharing code, my code was the 2-line loss function given above, where the ‘m(x)’ was just the output of a MLP with single output, and y[1,:] are just binary. y[2,:] are the grouping variables [1;1;1;1;1;2;2;2;3;3;3;3,…]

Thanks for any thoughts.

Share a code that someone can copy paste to run and get your error. Without that, helping is hard.

3 Likes

Hi OK - sorry. I haven’t done that because the files are huge, but your request gives me an idea, which is to create a very small version of my problem and include that. I’ll do that asap.

Thanks!

PS I tried a small version of my problem and it worked!
[Now I have to figure out why the big version doesn’t]

I will show to code below:

x=rand(12,2); s=[1;1;1;2;2;2;3;3;3;4;4;4]; y=[0;1;0;1;0;0;1;0;0;0;0;1];
ssp=sparse(1:12,s,ones(12)); ssp=ssp[:,s];
x1=Transpose(x); y1=Transpose([y ssp]);

m=Chain(Dense(2,1,identity))

function myloss(x0,y0)
    p=exp.(m(x0))./(exp.(m(x0))*y0[2:end,:]) 
    out=-sum(y0[1,:].*log.(p))
end

for i=1:5
Flux.train!(myloss,Flux.params(m),[(x1,y1),(x1,y1)],ADAM())
println(myloss(x1,y1))
end

The output was

60.07034136485465
60.04674504232545
60.02318068523912
59.99965043923457
59.97615087633916

So I have answered my own question for a toy version of my problem.
The answer to the question in the title of this post is ‘yes’.

I’m not done quite yet, though as I don’t understand why it doesn’t scale.

Thanks, though

1 Like

To answer the title question: Of course people have :). That’s one of the reasons why Flux and similar API’s are awesome.

Some common issues are things like:

  • Type stability, make sure you’re using Float32s on all inputs(Xs and Ys).
  • Make sure you denote your custom structs with the @Functor syntax as in the docs

Chances are it’s a small hiccup