Anyone developing multinomial logistic regression?

Hi all,

I’m looking to write a multinomial logistic regression function (including conditional logit). From what I can see, there is no extant package that can do this. Given the popularity of such a model, I’m surprised that it doesn’t exist yet in Julia, but would also be interested in collaborating. A good model for such a package would be the R mlogit package.

@Nosferican @mcreel

Not sure what mlogit is but I have one in my package, Haven’t fully tested it yet I’d love feedback

I also have classification measures for multiclass classifications, etc.

If there’s anything missing you’d like to see let me know and I probably have the wherewithall to add it.

Edit - Ah sorry I have a multinomial softmax regression. Sorry thats a different bird.

I will be registering my package in the next few days which has ologit and mlogit. The implementation for mlogit is similar to Stata’s mlogit rather than the R package. The R package allows for outcome specific features (e.g., price of y0, price of y1, etc) while Stata’s and mine only handle features for the unit of observation (e.g., age, education, etc.).

4 Likes

Awesome! That’s definitely a great help. I am basically looking for the equivalent of Stata’s asclogit command but I suspect that I can implement that using your package as a starting point.

I have implemented this in Matlab but will be curious to see how you handle the optimization.

I think softmax regression is the same as multinomial logistic regression. This package is really cool and I will take a look! Thanks for responding.

For nominal response models, it would be nice to add the generalization for conditional multinomial logistic (PR welcome). For ordinal logistic regression, I implemented the proportional odds logistic regression (POLR), but the generalization would also be welcomed as a PR. I am waiting for StatsModels to tag a new release for the Terms 2.0 era and for me to finish up the last final touches.

In terms of the optimization approach, (1) the random effects (Swamy-Arora) is fitted through a partial demeaning framework, (2) absorption of categorical fixed effects is done trough the method of alternating projections, (3) multinomial logistic regression is optimized through Fisher Scoring in a Vector Generalized Linear Model (VGLM) framework, and (4) ordinal logistic regression uses Optim.jl with a Newtonian with AD for the Hessian because I lost so much time trying to write up the analytical Hessian and failed.

2 Likes

Awesome, thanks! My Matlab implementation of Stata’s asclogit command is basically the equivalent of calling Optim.jl. I suspect Fisher scoring is faster, but I’ve never programmed that up before.

I had a peek at your source code this morning and will be in touch with a possible PR in the coming weeks.

1 Like

It might still need a tweak or two (e.g., StatsModels changes in master)… I want to get the beta test out in the coming weeks and then start working on optimizing the code and more robustness checks. That was a chapter in my defense (two weeks ago) so probably would be good to release it soon and have it stable by JuliaCon.

2 Likes

Can you provide the link to your package?

2 Likes

For those interested in regularised regression models including multinomial regressions, I’ve just released MLJLinearModels.jl which will soon be integrated with the MLJ environment; it’s competitive with Sklearn and similar R packages though rough around the edge and without docs (which is why it hasn’t been officially announced yet)

3 Likes

@tlienart - When this matures I’ll likely interface my package with it. Looks like a great start.

Hi,
I am trying to use your package, but I really have no clue how!
Is there any documentation or example I can follow?

Best,
Alessio

Yeah this is my fault really… there’s a docs branch but since starting a new job I’m struggling a bit to find the time to push it through

The tests provide quite good examples including comparisons with sklearn: https://github.com/alan-turing-institute/MLJLinearModels.jl/blob/master/test/fit/logistic-multinomial.jl

Long story short if you have a n x p matrix X and targets y encoded as 1,2,3,… (indices of the class) then you can just do the following:

mnr = MultinomialRegression(0.5) # 0.5 is the strength of the L2 regularisation
theta = fit(mnr, X, y)

not very dissimilar to how sklearn is called (and actually you can see the comparison code https://github.com/alan-turing-institute/MLJLinearModels.jl/blob/b5d91ac9d13bb98f78eda9b9f347e8a6ddb39aaa/test/fit/logistic-multinomial.jl#L90-L101)

To do the predictions you can apply X to theta then use a softmax:

preds   = apply_X(Xmatrix, θ, c) # c is the number of classes
preds .= softmax(preds)

for binary, same process but with ±1 labels and using sigmoid instead of softmax.

If you go the MLJ route, you can use the fit/predict machinery, see this example

3 Likes

Ok, Cool.
Indeed I looked at the tests, but still I was quite lost!

And how the matrix theta is structured?

X = rand(100,3)
y = rand([1,2,3],100)
λ = 0.5
mnr = MultinomialRegression(λ; fit_intercept=true)
θ  = fit(mnr, X, y)
X = rand(100,3)
y = rand([1,2,3],100)

parameters = reshape(θ,(3,4))

where parameters[1, 1:3] are the coefficient of the first class and parameters[1, 4] is the intercept?
Maybe I am dumb and not understanding how Logit works, but I am referring to this:

So I think the way the prediction work should help you there, basically (assuming no intercept first) what it does is:

  1. apply_X, this basically means (X * theta) = z
  2. apply a softmax on z

If you check the dimensions you will see that X is (n x p ) where n is the number of records, p is the dimensionality (number of features), and theta effectively corresponds to a (p x c) matrix where c is the number of classes (it’s stored as a single vector though, which is why you see this reshape; each column is one after the other)

Now if s = softmax z, it’s n x c (like z) with the rows summing to 1, basically each row contains the score for each class. For instance for 3 classes you would get predictions with rows like

[0.2, 0.5, 0.3]

meaning that the second class is the “most likely” for that point (most highly scored according to the model)

I hope this clarifies it a bit!

Edit: more specifically, in the no intercept case, each class gets p parameters, these are stacked one after the other in theta so 1:p then p+1:2p etc and theta has total length p*c (change p for p+1 for the case with intercept) if you want to compare it with what sklearn gives, you indeed have to do reshape(theta, (p, c)) or reshape(theta, (p+1, c)) (with intercept)

Edit2: code for apply might be useful too: https://github.com/alan-turing-institute/MLJLinearModels.jl/blob/b5d91ac9d13bb98f78eda9b9f347e8a6ddb39aaa/src/utils.jl#L35-L46

1 Like

Thank you!
The Mnist database got classified pretty well :slight_smile:

There was just something ‘odd’

I downloaded MNIST from

using Flux, Flux.Data.MNIST, Statistics
imgs = MNIST.images()
X = Array(transpose(hcat(float.(reshape.(imgs, :))...) ))
labels = MNIST.labels() .+1

The labels go from 0 to 9, but I had to translate them to [1,10] otherwise I got this error.
Looking at the number I had the intuition it was scoring over 1:9, like the class with 0 was ‘not welcome’

For the rest it worked smoothly,

Thanks a lot

DimensionMismatch("new dimensions (785, 10) must be consistent with array size 7056")
(::Base.var"#throw_dmrsa#197")(::Tuple{Int64,Int64}, ::Int64) at reshapedarray.jl:41
reshape at reshapedarray.jl:45 [inlined]
reshape at reshapedarray.jl:116 [inlined]
apply_X!(::Array{Float64,2}, ::Array{Float64,2}, ::Array{Float64,1}, ::Int64) at utils.jl:66
(::MLJLinearModels.var"#102#103"{GeneralizedLinearRegression{MultinomialLoss,ScaledPenalty{LPPenalty{2}}},Array{Float64,2},Array{Int64,1},Int64,Int64,Int64,Float64})(::Float64, ::Array{Float64,1}, ::Array{Float64,1}) at d_logistic.jl:149
(::NLSolversBase.var"#61#62"{NLSolversBase.InplaceObjective{Nothing,MLJLinearModels.var"#102#103"{GeneralizedLinearRegression{MultinomialLoss,ScaledPenalty{LPPenalty{2}}},Array{Float64,2},Array{Int64,1},Int64,Int64,Int64,Float64},Nothing,Nothing,Nothing},Float64})(::Array{Float64,1}, ::Array{Float64,1}) at incomplete.jl:45
value_gradient!!(::NLSolversBase.OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}) at interface.jl:82
initial_state(::Optim.LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}, ::Optim.Options{Float64,Nothing}, ::NLSolversBase.OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}) at l_bfgs.jl:158
optimize(::NLSolversBase.OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::Optim.LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}, ::Optim.Options{Float64,Nothing}) at optimize.jl:33
#optimize#93 at interface.jl:116 [inlined]
optimize(::NLSolversBase.InplaceObjective{Nothing,MLJLinearModels.var"#102#103"{GeneralizedLinearRegression{MultinomialLoss,ScaledPenalty{LPPenalty{2}}},Array{Float64,2},Array{Int64,1},Int64,Int64,Int64,Float64},Nothing,Nothing,Nothing}, ::Array{Float64,1}, ::Optim.LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}, ::Optim.Options{Float64,Nothing}) at interface.jl:115
optimize(::NLSolversBase.InplaceObjective{Nothing,MLJLinearModels.var"#102#103"{GeneralizedLinearRegression{MultinomialLoss,ScaledPenalty{LPPenalty{2}}},Array{Float64,2},Array{Int64,1},Int64,Int64,Int64,Float64},Nothing,Nothing,Nothing}, ::Array{Float64,1}, ::Optim.LBFGS{Nothing,LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Optim.var"#19#21"}) at interface.jl:115
_fit(::GeneralizedLinearRegression{MultinomialLoss,ScaledPenalty{LPPenalty{2}}}, ::LBFGS, ::Array{Float64,2}, ::Array{Int64,1}) at newton.jl:114
#fit#144(::LBFGS, ::typeof(fit), ::GeneralizedLinearRegression{MultinomialLoss,ScaledPenalty{LPPenalty{2}}}, ::Array{Float64,2}, ::Array{Int64,1}) at default.jl:48
fit(::GeneralizedLinearRegression{MultinomialLoss,ScaledPenalty{LPPenalty{2}}}, ::Array{Float64,2}, ::Array{Int64,1}) at default.jl:38
top-level scope at test_LR.jl:159

1 Like

Yes that’s actually a requirements to have the encoding to be 1…c (partly related to if you have a categorical vector and want to encode it to integer)

Cool thanks for reporting this! Would you mind opening an issue with your reproducing code? I might add it as an example in these much needed docs :sweat_smile:

1 Like

Here you go.

I would say that this topic got solved :slight_smile:
Thanks a lot for the help
Alessio

The example code:

# Classify MNIST digits with a simple multi-layer-perceptron
using Flux.Data.MNIST
using MLJLinearModels
# Get MNIST dataset and transpose for (records, features)
imgs = MNIST.images()
X = Array(transpose(hcat(float.(reshape.(imgs, :))...) )
# MNIST labels: Categorical labels must be 1...c, hence add .+1 to each label
labels = MNIST.labels() .+1
# and the number of classes
n_classes = length(Set(labels))
n_features = size(X,2)
# The MNIST database does not need the intercept
intercept = false
# deploy MultinomialRegression from MLJLinearModels, λ being the strenght of the reguliser
mnr = MultinomialRegression(λ; fit_intercept=intercept)
# Fit the model
θ  = fit(mnr, X, labels)
# The model parameters are organized such we can apply X⋅θ, the following is only to clarify
params = reshape(θ, n_features +Int(intercept), n_classes)
# Get the predictions X⋅θ 
preds = MLJLinearModels.softmax(MLJLinearModels.apply_X(X,θ,n_classes))
# map each vector to its maximal element 
targets = map(x->argmax(x),eachrow(preds))
#and evaluate the model over the labels
scores = 1- sum(targets-labels)/length(preds)
2 Likes