I’m having trouble fitting a really basic neural network in Flux. I wanted to get a handle on the syntax by simulating simple data from a logistic regression, but the model isn’t converging to anything reasonable. I think this is probably due to a misunderstanding on my part of the syntax. Any thoughts on what I’m doing wrong? Code below
# generate logistic regression data
using Distributions
## size of data
n = 1000
d = 10 # covariates including bias
## seed
srand(1)
## coefficients
b = rand(Normal(), d)
## covariates
X = rand(Normal(), n, d)
X[:, 1] = ones(n)
## output
θ = sigmoid.(X * b) # true probabilities
y = rand.(Bernoulli.(θ))
mean((θ - y) .^ 2) # MSE with truth
## now drop bias from data
X = X[:, 2:d]
# set up neural network
using Flux
using Flux.Tracker
using Flux: @epochs
model = Chain(Dense(d - 1, 5, sigmoid),
Dense(5, 1, sigmoid))
# train model
## set up loss to minimize, optimizer, and data
loss(x, y) = Flux.mse(model(x), y)
opt = ADAM(params(model))
data = [(X', y)] # note the transpose of covariates!
loss(X', y) # 267.83
# now fit model
@epochs 100 Flux.train!(loss, data, opt)
# check model performance
loss(X', y) # 252.88
I fixed the issue by minimizing crossentropy and using onehotbatch
instead of of minimizing MSE.
# load packages
using Distributions
using Flux
using Flux.Tracker
using Flux: onehotbatch, argmax, crossentropy, throttle, @epochs
# generate logistic regression data
## size of data
n = 1000
d = 10 # covariates including bias
## seed
srand(1)
## coefficients
b = rand(Normal(), d)
## covariates
X = rand(Normal(), n, d)
X[:, 1] = ones(n)
## output
θ = sigmoid.(X * b) # true probabilities
Y = onehotbatch(rand.(Bernoulli.(θ)), 0:1)
## now drop bias from data and transpose matrix
X = X[:, 2:d]
X = X'
# set up neural network
model = Chain(Dense(d - 1, 5, sigmoid),
Dense(5, 2),
softmax)
# train model
## set up loss to minimize, optimizer, and data
loss(x, y) = crossentropy(model(x), y)
opt = ADAM(params(model))
data = [(X, Y)] # note the transpose of covariates!
## accuracy
accuracy(x, y) = mean(argmax(model(x)) .== argmax(y))
accuracy(X, Y) # 32.5%
# now fit model
@epochs 1000 Flux.train!(loss, data, opt)
# check model performance
accuracy(X, Y) # 86.7%
1 Like
Circling back on this, I found that the issue was the dimension of the output… In my original post, X was of dimension d x n
and y was dimension n x 1
. As a side effect of changing y to one hot encoding, I corrected the dimension to 2 x n
.