Need some help with FluxOpt

I am trying to use the FluxOptTools (GitHub - baggepinnen/FluxOptTools.jl: Use Optim to train Flux models and visualize loss landscapes) to train a Flux model using Optim . I followed the example provided in the readme. It is working as intended. But when I slightly changed it to my case it isn’t working properly. The code is running without errors, but the parameters of the model are not getting updated. Can anyone please help me figure out what I am doing wrong?
Here is my code:

using Pkg
Pkg.add(["DataFrames", "RDatasets","Flux", "FluxOptTools", "Zygote", "Optim", "LossFunctions"])
using DataFrames
using RDatasets
using Flux, Zygote, Optim, FluxOptTools, Statistics
using LossFunctions
diabetes = dataset("MASS", "Pima.te")
y_df = diabetes[!,:Type] .== "Yes"
X_df = diabetes[!, Not(:Type)]
# Converting X and y into matrices and vectors 
y = vec(y_df)'
X = Matrix(Matrix(X_df)')

m      = Chain(Dense(7,20),    
loss() = mean(value(PerceptronLoss(),m(X),y))

pars   = Flux.params(m) # Initializing parameters 
initial_par = pars
lossfun, gradfun, fg!, p0 = optfuns(loss, pars)
res = Optim.optimize(Optim.only_fg!(fg!), p0, LBFGS() ,Optim.Options(show_trace=true))

Since y>=0 and m(X)>0, their product will always be >=0, so PerceptronLoss will always be zero. Maybe L1HingeLoss would be more appropriate for this problem? I tried this:

loss() = value(L1HingeLoss(), y, m(X), AggMode.Mean())

But that results in NaN in the convergence measures and I couldn’t figure out how to fix that.