OnlineStats and regularization hyperparameters

I’m working with online stats to learn a classification model. I have a sparse array of tfidf features and binary target variables and have figured out how to train my model. The API documentation is lacking and needs to include how to use the regularization functions.

Specifically, I need to know how to adjust the parameter lambda.

I also need to know if a bias term is added automatically or if I need to do this. So far I have the following:

using OnlineStats

o = fit!(StatLearn(length(feature_array), SGD(), L1Penalty()), (train_tfidf, train_y))

StatLearn: SGD | mean(λ)=0.0 | 0.5 * (L2DistLoss) | L1Penalty | nobs=7782 | nvars=2446

I can gather predictions using:
test_y_pred = predict(o, test_tfidf)

1945-element Array{Float64,1}:
0.12856579087723008
-0.013671349299302107
0.13942378280298387

I can classify them using:
test_y_pred = classify(o, test_tfidf)

1945-element Array{Float64,1}:
1.0
-1.0
1.0

1 Like

Also, am I approaching this right? I am trying to learn the available interfaces for machine learning in julia. Looking for a bit of a starting point. The sklearn port looks interesting but I’m interested in keeping my work distributable with Julia db tables.

Specifically, I need to know how to adjust the parameter lambda.

You can provide a vector of lambas (parameter-wise penalties), something like

StatLearn(p, .1 * ones(p))

or a single lambda to apply to each parameter:

StatLearn(p, .1)

I also need to know if a bias term is added automatically or if I need to do this

There is no bias term added automatically. You can take a look at BiasVec, which adds it lazily:

julia> BiasVec(rand(5))
6-element BiasVec{Float64,Array{Float64,1}}:
 0.47513715528898093
 0.45808733943617064
 0.5337189993055129
 0.6613951516035794
 0.636024656190582
 1.0

Bringing it all together, if you have p predictors, you’d want something like

n, p = 10^6, 10
x = randn(n, p)
y = randn(n)

julia> o = StatLearn(p + 1, L2Penalty(), vcat(.1 * ones(p), 0))  # avoid penalizing bias/intercept

fit!(o, zip((BiasVec(xi) for xi in eachrow(x)), y))
1 Like

Oh great! This is awesome and exactly the kind of explanation I was looking forward to. Any ideas on how I might test on held out data? I have tried the following:

fit!(o, zip((BiasVec(xi) for xi in OnlineStats.eachrow(train_tfidf)), train_y));

StatLearn: SGD | mean(λ)=0.09995913363302009 | 0.5 * (L2DistLoss) | L1Penalty | nobs=7782 | nvars=2447

EDIT:
managed to figure out how to predict. Not sure why but all observations are having the same predicted probability

[predict(o, r) for r in BiasVec.(OnlineStats.eachrow(test_tfidf))]

1945-element Array{Float64,1}:
0.034086370434273434
0.034086370434273434
0.034086370434273434

All coefficients are getting the same value except the intercept after fit:

coef(o)

2447-element Array{Float64,1}:
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.11715854311395045

Hmm, my first guess is that your lambda is set high enough that the lasso penalty is setting everything to zero.

1 Like

Unfortunately, I think I was using the wrong syntax for predicting value outputs… I still am seeing only zeros for coefficients but at least getting different predictions fo each tfidf row:

[predict(o, BiasVec(r)) for r in OnlineStats.eachrow(test_tfidf)]
1945-element Array{Float64,1}:
0.5037992943250542
0.3412787518465366
0.479159463786335
0.3412787518465366
0.4102191078164358
0.3872389891598027

Also it could be the model is large uninformative. Closing this as you have answered my question