Combining Lux.jl with IPNewton() for Optimization

I have some specific bounds on my Neural Network weights and biases, that cannot be expressed as simple parameter inequlality constraints. That’s why I would like to try something like IPNewton for the Optimization, even though it is slower and possibly not converging.

The problem now is that IPNewton accepts arrays for the optimization, whereas Lux.jl accepts ComponentVectors of a special form. Specifically, to convert an array of weights and biases in the Optimization routine to a ComponentVector is difficult. I wrote my own function for this. But unfortunately, I am having problems with AutoDiff and mutations.

Is there any easy, feasible way to convert arrays or Vectors to a structure that a Lux NN can accept as parameters ? This is my current function:

function convert_params_to_tuple_no_ode(p::Vector{Float64}, n_in::Int64, n_out::Int64, hidden_layers::Tuple{Int64, Int64})
    # Create the layers tuple: input layer, hidden layers, output layer
    layers = (n_in, fill(hidden_layers[2], hidden_layers[1])..., n_out)
    
    final_idx = 1

    nn_subtuple = NamedTuple()

    for layer_nr in 1:(length(layers)-1)
        weight_size = layers[layer_nr] * layers[layer_nr + 1]
        bias_size = layers[layer_nr + 1]
        
        # Extract weights and biases from the parameter vector
        weights = p[final_idx:(final_idx + weight_size - 1)]
        final_idx += weight_size
        weights = reshape(weights, layers[layer_nr + 1], layers[layer_nr])
        
        biases = p[final_idx:(final_idx + bias_size - 1)]
        final_idx += bias_size
        
        # Create a vector for biases
        bias_vec = biases  # Bias is usually a vector, not a matrix
        
        # Create a named tuple for the current layer
        subtuple_layer = (weight = weights, bias = bias_vec)
        layer_symbol = Symbol("layer_", layer_nr)
        # Create a temporary tuple for the current layer and merge it into nn_subtuple
        temp_tuple = (layer_symbol => subtuple_layer,)
        nn_subtuple = (;nn_subtuple, temp_tuple)
    end

    return ComponentArrays.ComponentVector((ps_lux = nn_subtuple,))
end

It’s one line, convert(CType,p) where CType is the typeof(ComponentArray(nn_p)) which can just be computed at the Lux.setup outside of the loss function so CType can just be a constant. It’s literally one line, delete all of this.

3 Likes

Oh my god thank you. I didn’t see the forest for the trees.