I’m trying to run a straightforward regression problem through Flux. I’ve modified the loss function from the example, from a simple mse to something like a chi-square. The problem has to do with the calibration of a calorimeter, and the error term of the chi-square is proportional to the sqrt of the energy.

The following code exemplifies the problem:

```
using Flux
model = Chain(
Dense(250, 64),
Dense(64, 1)
)
x = rand(250, 1000)
y = rand(1000)
function loss(x, y)
error = sqrt.(y)
sqrt(sum(((model(x) .- y).^2)./error))
end
@time Flux.Tracker.gradient(()->loss(x, y), params(model))
```

This code results in the following output on my machine.

```
15.787468 seconds (211.02 M allocations: 5.623 GiB, 4.70% gc time)
```

Removing the error term from the loss function gives me:

```
0.117279 seconds (15.27 k allocations: 59.252 MiB, 14.24% gc time)
```

In my actual example (where my input vector is of size 200k), I just can’t train with the loss function with the error term at all, but without it, my training is suboptimal, and I think I need this term.

Does anybody have a suggestion for how to get this loss function to work?