Recover weights of logistic regression using MLJ.jl

Hi all,

I’m trying to see if I can recover the weights of the logistic regression from data I’m generating synthetically, but for some reason there is large error in the optimized weights when using MLJ.jl.

To create the data, I’m using a sigmoid function with no intercept, decision boundary of 0.5, and 3 features:

sigmoid(x) = 1.0 / (1.0 + exp(-x))

function generate_y(true_weights::Vector{Float64}, U::Matrix{Float64})
    n_time_points = size(U, 2)  # Legnth of time series
    y = [sigmoid(dot(true_weights, U[:, t])) >= 0.5 for t in 1:n_time_points]
    return y
end

function recover_weights(y::Vector{Bool}, U::Matrix{Float64})
    # Convert U to a DataFrame
    U_table = MLJ.table(permutedims(U))

    # Convert the binary target y into a categorical variable
    y_categorical = coerce(y, Multiclass)

    # Define the logistic regression model
    logistic_model = LogisticClassifier(lambda=0.02, fit_intercept=false)

    # Create a machine to fit the model
    mach = machine(logistic_model, U_table, y_categorical)

    # Fit the model to recover the weights
    fit!(mach)

    # Get the recovered weights
    recovered_weights = fitted_params(mach).coefs
    return recovered_weights
end

When using with:

true_weights = [0.2, -0.07, 0.005]

# Generate random control vector U with 3 features for each time point
n_time_points = 1000
U = rand(3, n_time_points)  # U is a matrix with 3 features

y = generate_y(true_weights, U)

recovered_weights = recover_weights(y, U)

I’m getting weights that are far from the true ones (I do get a negative value for the second one which is good).

Is there a way to improve this efficiently? Can I define the 0.5 decision boundary in MLJ? Use another optimizer?

Any ideas?

Thank you !!