# How to sample from a logistic regression model

I want to draw N independent (but not identically distributed) samples from a conditional Bernoulli distribution: y[i] ~ Ber(p[i]), where p[i] = logistic(sum(w .* x[i])). I am currently using the code below, but it seems overly complex. Is there a better way?

``````Random.seed!(1)
D = 5;
N = 10;
X = randn(N, D)
w = randn(D)
logits = X*w
probs = [StatsFuns.logistic(l) for l in logits]
y = zeros(N)
ndx = findall(x -> x, rand(N) .> probs)
for i in ndx; y[i] = 1; end
``````

Also, what’s the best way to generalize this to more than 2 classes? i.e. how best to sample from a conditional categorical distribution where p[i,:] = softmax(W * x[i])?

Eg (the last line is the sampling)

``````using Random, StatsFuns
Random.seed!(1)
D, N = 5, 10
X = randn(N, D)
w = randn(D)
y = logistic.(X * w) .< rand(N)
``````

For the softmax case, you are essentially sampling from a categorical distribution, look at `Distributions.Categorical`.

1 Like

Thanks a lot. For future reference, this is what I came up with.

``````"""
logreg_sample(W, X)

Sample from a logistic regression model with C classes and D features.

Inputs:
`W`: C*D matrix of weights.
`X`: N*D matrix of feature vectors.

Outputs:
`y_cat`: N-vector of integer labels from {1,...,C}.
`y_onehot`: N-vector of Flux:OneHotVector.
"""
function logreg_sample(W, X)
C = size(W, 1)
N = size(X, 1)
@assert size(W,2) == size(X,2)
y_cat = Array{Int}(undef, N)
y_onehot = Array{Flux.OneHotVector}(undef, N)
# Binary case
#y = Flux.logistic.(X * w) .< rand(N)
for i=1:N
probs = Flux.softmax(W * X[i,:])
dist = Distributions.Categorical(probs)
y_cat[i] = rand(dist)
y_onehot[i] = Flux.onehot(y_cat[i], 1:C)
end
return y_cat, y_onehot
end
``````

One annoying detail I came across: in flux.jl, it seems that design matrices always store the examples along the columns, so they have size D x N instead of N x D. This required the following change to Tamas’s code:

``````function sample1(w, X)
D, N = size(X)
W = reshape(w, (1, D))
return (vec(Flux.sigmoid.(W * X)) .< rand(N))
end
``````

Personally I find it easier to use an array compression, as shown below. Is there any disadvantage to this?

``````function sample2(w, X)
D, N = size(X)
p = [Flux.sigmoid(sum(w .* X[:,i])) for i in 1:N]
return p .< rand(N)
end
``````

If N is the number of observations, I think this is quite natural in Julia when you’re in a situation where you might only need a sub-sample of observations to evaluate an objective (say in stochastic gradient descent). That way, you’re accessing data in memory according to the underlying memory-layout when grabbing an observation. If the layout is Nobs x Nfeatures then you’ll jump around in memory when you try to get the i’th observation as `Data[i, :]`.

1 Like

Isn’t it just two different paradigms - classic data analysis (e.g. with DataFrames) usually has observations as rows, machine learning as columns. (MultivariateStats uses a bit of both, I think determined by the wind direction at the time of coding.) There are good reasons for both but it’s also something that can really trip you up.

1 Like

Yes I guessed that memory layout is why. It’s interesting that nearly all the ML textbooks stick to the convention that the design matrix X is N x D. The only exception I know of is the Gaussian Process book by Rasmussen and Williams (IIRC). It actually simplifies some of the equations (as well as the code) if X is D x N, but it will look unfamiliar (to some) to write OLS estimator as (X X’)^{-1} X y instead of the more familiar (X’X)^{-1} X’ y. I am tempted to switch to the D * N convention for v2 of my own book (“Machine learning: a probabilistic perspective”), as I make the switch from Matlab to Julia 3 Likes

Maybe this helps explain it:

https://docs.julialang.org/en/v1/manual/performance-tips/#Access-arrays-in-memory-order,-along-columns-1

2 Likes

I also found this related thread:

1 Like

I’m not so sure it’s just a field-difference thing (and as @murphyk pointed out most textbooks in ML use `nxp`). There’s also something to be said with the fact that it’s usually desirable to have columns be of the same type for data storage I guess (which is a reason why most tabular-data packages in Julia use `nxp` afaiu). If you just have a big array of floating point numbers this doesn’t matter but in general that wouldn’t be the case.

PS: looking forward to seeing MLPP using Julia!

3 Likes