Hi all,
I’m trying to see if I can recover the weights of the logistic regression from data I’m generating synthetically, but for some reason there is large error in the optimized weights when using MLJ.jl.
To create the data, I’m using a sigmoid function with no intercept, decision boundary of 0.5, and 3 features:
sigmoid(x) = 1.0 / (1.0 + exp(-x))
function generate_y(true_weights::Vector{Float64}, U::Matrix{Float64})
n_time_points = size(U, 2) # Legnth of time series
y = [sigmoid(dot(true_weights, U[:, t])) >= 0.5 for t in 1:n_time_points]
return y
end
function recover_weights(y::Vector{Bool}, U::Matrix{Float64})
# Convert U to a DataFrame
U_table = MLJ.table(permutedims(U))
# Convert the binary target y into a categorical variable
y_categorical = coerce(y, Multiclass)
# Define the logistic regression model
logistic_model = LogisticClassifier(lambda=0.02, fit_intercept=false)
# Create a machine to fit the model
mach = machine(logistic_model, U_table, y_categorical)
# Fit the model to recover the weights
fit!(mach)
# Get the recovered weights
recovered_weights = fitted_params(mach).coefs
return recovered_weights
end
When using with:
true_weights = [0.2, -0.07, 0.005]
# Generate random control vector U with 3 features for each time point
n_time_points = 1000
U = rand(3, n_time_points) # U is a matrix with 3 features
y = generate_y(true_weights, U)
recovered_weights = recover_weights(y, U)
I’m getting weights that are far from the true ones (I do get a negative value for the second one which is good).
Is there a way to improve this efficiently? Can I define the 0.5
decision boundary in MLJ? Use another optimizer?
Any ideas?
Thank you !!