Building a custom Layer in Lux.jl

I am constructing a custom layer, and have an error:

ERROR: MethodError: no method matching iterate(::Type{Vector{Float32}})
Closest candidates are:
  iterate(::Union{LinRange, StepRangeLen}) at range.jl:872
  iterate(::Union{LinRange, StepRangeLen}, ::Integer) at range.jl:872
  iterate(::T) where T<:Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}} at dict.jl:712
  ...
Stacktrace:
 [1] Polylayer(coefs::Vector{Float32})
   @ Main ~/src/2022/basic_UODE/custom_lux_layer/poly_layer.jl:13
 [2] top-level scope
   @ ~/src/2022/basic_UODE/custom_lux_layer/poly_layer.jl:17

which should be easy to decipher, but I can’t figure it out. I created a MWE, which is:

using Lux

struct Polylayer{C} <: Lux.AbstractExplicitLayer
    coefs::C
end

function Polylayer(coefs)
    dtype = (typeof(coefs))
    return Polylayer{dtype...}(coefs)
end

a = Vector{Float32}(undef, 4)
pol = Polylayer(a)

Any insight it appreciated.

For reference the docs are:

http://lux.csail.mit.edu/stable/manual/interface/#singular-layer

and you’re missing a few functions from that.

But your immediate issue is that the constructor function doesn’t quite make sense? What are you trying to do? You can just delete

function Polylayer(coefs)
    dtype = (typeof(coefs))
    return Polylayer{dtype...}(coefs)
end

and it’ll be fine.

Hi,

Yes, I know there are functions missing. But since I am only testing the layer by calling the functions manually, I did not think the added functions were necessary. I also created a show function, thinking that would do the trick. Finally, I removed other functionality to create a MWE.

I want to construct a custom layer that implements a polynomial of specified degree for a pair of variables. So I created the structure:

struct Polylayer{C} <: Lux.AbstractExplicitLayer
    in_dims::Int
    out_dims::Int
    degree::Int
    coefs::C
end

and the following constructors:

function Polylayer(in_dims::Int, out_dims::Int, coefs)
    degree = length(coefs)
    dtype = (typeof(coefs))
    return Polylayer{dtype...}(in_dims, out_dims, degree, coefs)
end

function Polylayer(in_dims::Int, out_dims::Int, degree::Int)
    coefs = zeros(degree) 
    dtype = (typeof(coefs))
    #println(coefs, typeof(coefs))
    return Polylayer{dtype...}(in_dims, out_dims, degree, coefs)
end

I am modeling this approach on the Dense layer found in the Lux.jl source code. I looked for examples of custom layers for Lux.jl online and there are very few of them at this time.

I know I must implement initialparameters, parameterlength and statelength.

The layer will receive one scalar variable, and output the evaluation of the polynomial Polylayer(x, p, st). My objective is to have Lux train the polynomial coefficients. The larger experiment is an execution of the LV equations where I express the nonlinear term directly as a polynomial and use UODE to find the coefficients. This is a test example for a future larger problem.

I implemented a more complete layer, and it does generate errors.

struct Polylayer{C,F} <: Lux.AbstractExplicitLayer
    in_dims::Int
    out_dims::Int
    degree::Int
    coefs::C
    init_weight::F
end

function Base.show(io::IO, d::Polylayer)
    println(io, "Polylayer($(d.in_dims) => $(d.out_dims), degree: $(d.degree)")
    println("inside show")
end

function Polylayer(in_dims::Int, out_dims::Int, coefs, init_weight=Lux.rand32)
    degree = length(coefs)
    dtype = (typeof(coefs), typeof(init_weight))
    return Polylayer{dtype...}(in_dims, out_dims, degree, coefs, init_weight)
end

function initialparameters(rng::AbstractRNG, d::Polylayer)
    return (d.init_weight(rng, d.degree+1))
end

statelength(d::Polylayer) = 0
function parameterlength(d::Polylayer)
    return d.degree + 1
end

I just did not think that the additional two functions were necessary, even for minimal testing.
Rather, I thought the additional functions would be necessary when actually using the layer within a chain.

Thanks!

OK, I now understand. No need to include the weight arrays in the struct. Rather when I call Lux.setup, the parameters are initialized.

I must say that I still have difficulty wrapping my head around the Julia way of coding. Yes, Julia is extremely efficient, but the approach to coding takes lots of getting used to. I am not convinced it is faster than conventional languages. So far, the development process (for me) is slower. But of course, that is probably due to the many years of coding in Fortran, C++, Python.

Cheers, and happy holidays!

@ChrisRackauckas,

Here is my next to final custom layer. All seems to work except the very last Lux.setup(..) call. Note that if I call initialparameters() directly, the function returns a parameterized array. Strange. Here is the code:

using Lux
using Random

"""
Apply a polynomial to an input Float
The coefficients are trainable. 
First attempt: polynomial of a specified `degree`
"""

struct Polylayer{F} <: Lux.AbstractExplicitLayer
    in_dims::Int
    out_dims::Int
    degree::Int
    init_weight::F
end

function Base.show(io::IO, d::Polylayer)
    println(io, "Polylayer($(d.in_dims) => $(d.out_dims), degree: $(d.degree)")
    println("inside show")
end

function Polylayer(in_dims::Int, out_dims::Int, degree::Int, init_weight=Lux.rand32)
    dtypes = (typeof(init_weight))
    return Polylayer{dtypes}(in_dims, out_dims, degree, init_weight)
end

function initialparameters(rng::AbstractRNG, d::Polylayer)
    return (d.init_weight(rng, d.degree+1))
end

statelength(d::Polylayer) = 0

function parameterlength(d::Polylayer)
    return d.degree + 1
end
# ====================================

rng = Random.default_rng()
model = l.Polylayer(5, 3, 3)
ps, st = Lux.setup(rng, model)

ps   # I get an empty parameter list

I think here the disconnect may be less the language and more the specific pattern the library uses. It helps if you’ve seen GitHub - google/flax: Flax is a neural network library for JAX that is designed for flexibility. or GitHub - google-deepmind/dm-haiku: JAX-based neural network library before, both of which separate layers from their actual parameters as Lux does. Ordinarily in Julia, one would include such arrays in the layer structs themselves (this is what Flux, Knet and co. do, for example) as you were expecting to be the case.

Very true, @ToucheSir ! Intellectually I know how it works, but out of habit, I return to the standard way of doing things. I tried learning Jax recently but have not put in the time. I am a Pytorch person myself.

function initialparameters(rng::AbstractRNG, d::Polylayer)
    return (coeffs = d.init_weight(rng, d.degree+1),)
end

Lux.initialstates(::AbstractRNG, ::Polylayer) = NamedTuple()

function (l::Polylayer)(x::AbstractMatrix, ps, st::NamedTuple)
    y = ps.coeffs *. x
   ...  # do whatever your layer is
end

Thanks, but I still get empty arrays from Lux.setup.
I noticed you added a comma in initial parameters, but it had no effect. I have no idea there is a tiny mistake somewhere. Julia is rather unforgivable, as is any compiled language.

So make sure we are looking at the same code, I reattach it:

sing Lux
using Random

"""
Apply a polynomial to an input Float
The coefficients are trainable. 
First attempt: polynomial of a specified `degree`
"""

struct Polylayer{F} <: Lux.AbstractExplicitLayer
    out_dims::Int
    degree::Int
    init_weight::F
end

function Base.show(io::IO, d::Polylayer)
    println(io, "Polylayer(out_dims: $(d.out_dims)), degree: $(d.degree)")
end

function Polylayer(out_dims::Int, degree::Int, init_weight=Lux.rand32)
    dtypes = (typeof(init_weight))
    return Polylayer{dtypes}(out_dims, degree, init_weight)
end

function (l::Polylayer)(x::AbstractMatrix, ps, st::NamedTuple)
    y = x
    return y, st
end

function initialparameters(rng::AbstractRNG, d::Polylayer)
     return (coeffs=d.init_weight(rng, d.degree+1),)
end

Lux.initialstates(::AbstractRNG, ::Polylayer) = NamedTuple()

statelength(d::Polylayer) = 0

function parameterlength(d::Polylayer)
    return d.degree + 1
end

# ====================================

rng = Random.default_rng()
Random.seed!(rng, 0)

model = Polylayer(5, 3, Lux.rand32)
ps, st = Lux.setup(rng, model)     # <<<<<< produces two empty NamedTuples.
initialparameters(rng, model)      # Initializes correctly. 

That’s on purpose: it should return a tuple instead of an array. See (1) vs (1,).

And the function is shadowing instead of extending. You want in that script:

function Lux.initialparameters(rng::AbstractRNG, d::Polylayer)
     return (coeffs=d.init_weight(rng, d.degree+1),)
end

Lux.initialstates(::AbstractRNG, ::Polylayer) = NamedTuple()

Lux.statelength(d::Polylayer) = 0

function Lux.parameterlength(d::Polylayer)
    return d.degree + 1
end