Combining Lux.jl with IPNewton() for Optimization

paul188 · May 17, 2024, 10:01am

I have some specific bounds on my Neural Network weights and biases, that cannot be expressed as simple parameter inequlality constraints. That’s why I would like to try something like IPNewton for the Optimization, even though it is slower and possibly not converging.

The problem now is that IPNewton accepts arrays for the optimization, whereas Lux.jl accepts ComponentVectors of a special form. Specifically, to convert an array of weights and biases in the Optimization routine to a ComponentVector is difficult. I wrote my own function for this. But unfortunately, I am having problems with AutoDiff and mutations.

Is there any easy, feasible way to convert arrays or Vectors to a structure that a Lux NN can accept as parameters ? This is my current function:

function convert_params_to_tuple_no_ode(p::Vector{Float64}, n_in::Int64, n_out::Int64, hidden_layers::Tuple{Int64, Int64})
    # Create the layers tuple: input layer, hidden layers, output layer
    layers = (n_in, fill(hidden_layers[2], hidden_layers[1])..., n_out)
    
    final_idx = 1

    nn_subtuple = NamedTuple()

    for layer_nr in 1:(length(layers)-1)
        weight_size = layers[layer_nr] * layers[layer_nr + 1]
        bias_size = layers[layer_nr + 1]
        
        # Extract weights and biases from the parameter vector
        weights = p[final_idx:(final_idx + weight_size - 1)]
        final_idx += weight_size
        weights = reshape(weights, layers[layer_nr + 1], layers[layer_nr])
        
        biases = p[final_idx:(final_idx + bias_size - 1)]
        final_idx += bias_size
        
        # Create a vector for biases
        bias_vec = biases  # Bias is usually a vector, not a matrix
        
        # Create a named tuple for the current layer
        subtuple_layer = (weight = weights, bias = bias_vec)
        layer_symbol = Symbol("layer_", layer_nr)
        # Create a temporary tuple for the current layer and merge it into nn_subtuple
        temp_tuple = (layer_symbol => subtuple_layer,)
        nn_subtuple = (;nn_subtuple, temp_tuple)
    end

    return ComponentArrays.ComponentVector((ps_lux = nn_subtuple,))
end

ChrisRackauckas · May 17, 2024, 10:49am

It’s one line, convert(CType,p) where CType is the typeof(ComponentArray(nn_p)) which can just be computed at the Lux.setup outside of the loss function so CType can just be a constant. It’s literally one line, delete all of this.

paul188 · May 19, 2024, 2:52pm

Oh my god thank you. I didn’t see the forest for the trees.

Topic		Replies	Views
Initialize Lux.jl NN parameters according to Lux.glorot_normal Machine Learning	1	724	November 15, 2022
Optimization.jl and constraints Optimization (Mathematical) question	19	1330	February 20, 2023
[ANN] Lux.jl: Explicitly Parameterized Neural Networks in Julia Package Announcements package , announcement , machine-learning	50	11306	April 27, 2024
IPNewton and ForwardDiff with preallocated Arrays Optimization (Mathematical)	2	789	February 11, 2021
Posing constraints on component array optimizations Optimization (Mathematical) componentarrays	0	288	November 9, 2022

Combining Lux.jl with IPNewton() for Optimization

Related topics