I am experiencing some odd behavior from the Logistic Classifier in MLJLinearModels.jl that maybe somebody can help me to understand. This MWE is not the research problem I am working on (because that is hard to boil down), but nevertheless shows some behavior that is related to what I am seeing in my problem and is confusing to me.
using DataFrames, MLJ, Distributions
pos_samples = [[1 + rand(Normal(0, 0.1)), 1] for _ in 1:1000]
neg_samples = [[-1 + rand(Normal(0, 0.1)), 0] for _ in 1:1000]
samples = vcat(pos_samples, neg_samples)
data = DataFrame(mapreduce(permutedims, vcat, samples), [:sample, :class])
display(scatter(data.sample, data.class, legend = :none))
X_train = data[!, [:sample]]
y_train = coerce(data.class, Multiclass)
LC = @load LogisticClassifier pkg = MLJLinearModels
model = machine(LC(), X_train, y_train)
fit!(model)
println("Fitted Model.")
test_vals = LinRange(-2, 2, 101)
positive_prob = [p.prob_given_ref[2] for p in MLJ.predict(model, test_vals[:, :])]
plot(test_vals, positive_prob)
In this code, I generate noisy data points centered near +1 and near -1. Those centered near +1 are labeled as being in the “positive” class (i.e., with a label 1). The points centered near -1 are the “negative class”, (i.e., with a label 0).
I use these points to train a logistic classifier, and then generate a completely synthetic test set (uniformly spaced points between -2 and +2), expecting that the probability of the positive class to be very close to zero for points to the left of zero, and for the same probability to be close to one for points to the right of zero. However, this is not what I find. I find that the classifier is at best about “70% certain” of the class one way or the other.
Below is a visualization of my training samples and my plot of P(class 1) by test sample.
I’d expect the probability curve to look more like a sigmoid function after it’s been trained.
Can somebody tell me if I’m just being silly and missing something, or if this kind of behavior is actually unexpected?
EDIT: I forgot to mention that in my original problem, simply switching the model from the LogisticClassifier
to the DecisionTreeClassifier
available through the DecisionTree
package makes everything work just fine. The same holds for this simplified MWE: Making that change produces an indicator-like probability curve that jumps from 0 to 1 near zero, which is exactly the behavior I’m trying to produce with Logistic Regression.