For the past few weeks, I’ve been working on creating a set of benchmark test cases that will be used to evaluate and train LLMs to improve their performance on Julia code. I’m particularly interested in test cases that people have tried to use AI agents on, but they barely failed or appear to be just beyond the capability frontier of current leading-edge agents. This’ll all be open source in the medium term future, but at the moment I’m keeping it a little smaller to make sure I can help people get the test cases write and provide API credits to get pass rates, etc. If you’re interested in participating, please ping me on slack.
LLMs tend to write slightly outdated code when it comes to Flux.jl, as the package API changed a bit over the last few years. GPT-5 gives me code that declares custom layers with @functor rather than @layer, but the latter is recommended in recent versions of Flux.jl. Generally speaking, fine-tuning LLMs to write “modern” Julia may be useful.
Write a minimal example for defining a custom layer in Flux.jl. Only give code not explanations.
The response was (WARNING: invalid code below)
using Flux
struct MyLayer
W::Matrix{Float32}
b::Vector{Float32}
end
Flux.@functor MyLayer
MyLayer(in, out) = MyLayer(Flux.glorot_uniform(out, in), zeros(Float32, out))
(m::MyLayer)(x) = m.W * x .+ m.b
m = MyLayer(3, 2)
x = rand(Float32, 3, 5)
y = m(x)
gs = gradient(params(m)) do
sum(abs2, m(x))
end
The code has a minor error, params instead of Flux.params, and running the corrected code triggers 3 deprecation warnings from Flux. My attempt at a modernized version is
using Flux
struct MyLayer
W::Matrix{Float32}
b::Vector{Float32}
end
Flux.@layer MyLayer
MyLayer(in, out) = MyLayer(Flux.glorot_uniform(out, in), zeros(Float32, out))
(m::MyLayer)(x) = m.W * x .+ m.b
m = MyLayer(3, 2)
x = rand(Float32, 3, 5)
y = m(x)
gs = Flux.withgradient(m) do model
sum(abs2, model(x))
end