Hi everyone,

I am using Flux to solve a nonstandard optimisation problem with custom loss.

The solution is part a of a larger simulation which I run many times (100,000+) so speed is key. When I try to run my script on the high performance cluster the code slows down by over 10 times.

I am using Julia v10.4.1 and Flux v0.14.19.

I have included a minimal working version with some magic numbers as loading the actual parameters would bloat the script.

On my local machine (OSX 14.5) I get:

8.243264 seconds (31.65 M allocations: 1.984 GiB, 7.36% gc time, 99.85% compilation time)

2.075706 seconds (1.22 M allocations: 6.835 GiB, 6.76% gc time)

On the HPC cluster (Linux 5.14.0-427.31.1.el9_4.x86_64) I get the following output:

23.756881 seconds (30.78 M allocations: 1.924 GiB, 5.56% gc time, 99.32% compilation time)

20.209329 seconds (1.22 M allocations: 6.810 GiB, 0.66% gc time)

I want to upload the file as an attachment but as I am new I have to copy paste the code in this text box.

I would appreciate any help or suggestions!

```
using Flux
network_width = 32
perceptron = Chain(
Dense(6, network_width, leakyrelu),
Dense(network_width, network_width, leakyrelu),
Dense(network_width, network_width, leakyrelu),
Dense(network_width, network_width, leakyrelu),
Dense(network_width, 3, relu)
)
opt_state = Flux.Optimiser(Flux.Adam(1e-6),ClipValue(1e-5))
state = Flux.setup(opt_state, perceptron)
k = Float32.(Vector(range(1,100,1000)))
b = Float32.(Vector(range(1,100,1000)))
w = Float32.(Vector(range(1,100,1000)))
r_k = Float32.(vcat(fill(0.1, length(k))...))
r_b = Float32.(vcat(fill(0.01, length(k))...))
p = Float32.(vcat(fill(1, length(k))...))
pi_ = Float32.(vcat(fill(0.01, length(k))...))
function abs_appr(x)
y = sqrt.(x.^2 .+ Float32(1e-6))
return y
end
function Residuals(perceptron, r_k, r_b, k, b, w, p, pi_,weights)
n = size(w, 1)
s = hcat(r_k, r_b, k, b, w, p)'
x = perceptron(s)
c = x[1, :]
k1 = x[2, :]
b1 = x[3, :]
d = k1 .- (1 .+ r_k) .* k
rknext = Float32.(max.(exp.(log.(1 .+ r_k) .* 0.9 .+ 0.1 .+ 0.1 * randn(Float32, n)) .- 1,0))
rbnext = Float32.(exp.(log.(1 .+ r_b) .* 0.9 .+ 0.1 .+ 0.1 * randn(Float32, n)) .- 1)
pinext = Float32.(exp.(log.(1 .+ pi_) .* 0.9 .+ 0.1 .+ 0.1 * randn(Float32, n)) .- 1)
wnext = Float32.(w .* 0.9 .+ 0.1 .+ 0.1 * randn(Float32, n))
p1 = Float32.(p .* (1 .+ pinext))
s = hcat(rknext, rbnext, k1, b1, wnext, p1)'
x = perceptron(s)
c1 = x[1, :]
k2 = x[2, :]
d1 = k2 .- (1 .+ rknext) .* k1
R1 = Float32.(1 .- 0.95 .* (1 .+ rbnext) .* (c1 ./ c ).^(-1.5) .* (p ./ p1))
R2 = Float32.(w .+ (1 .+ r_b) .* b .+ (1 .+ r_k) .* k .- c .* p .- b1 .- 0.01 .* abs_appr.(d).^1.5 .- k1)
R3 = Float32.(1 .+ d .* 0.01 .* 1.5 .* abs_appr.(d).^(1.5 - 2) .- 0.95 .* (1 .+ rknext) .* (c1 ./ c ).^(-1.5) .* (p ./ p1) .* (1 .+ d1 .* 0.01 .* 1.5 .* abs_appr.(d1).^(1.5 - 2)))
R_squared = sum(weights[1] * R1.^2 + weights[2] *R2.^2 + weights[3] *R3.^2)/n
return R_squared
end
function train_me!(epochs, perceptron, w, k, b, r_k, r_b, p, pi_, state; weights = [1,1,1])
for epoch in 1:epochs
# Compute the value and gradients of the loss function
val, grads = Flux.withgradient(perceptron) do m
loss = Residuals(m, r_k, r_b, k, b, w, p, pi_, weights)
end
Flux.update!(state, perceptron, grads[1])
end
end
@time train_me!(2, perceptron, w, k, b, r_k, r_b, p, pi_, state; weights = [1, 0.1, 1]);
@time train_me!(1000, perceptron, w, k, b, r_k, r_b, p, pi_, state; weights = [1, 0.1, 1]);```
```