Problems with Flux


#1

I’m having trouble fitting a really basic neural network in Flux. I wanted to get a handle on the syntax by simulating simple data from a logistic regression, but the model isn’t converging to anything reasonable. I think this is probably due to a misunderstanding on my part of the syntax. Any thoughts on what I’m doing wrong? Code below

# generate logistic regression data
using Distributions

## size of data
n = 1000
d = 10 # covariates including bias

## seed
srand(1)

## coefficients
b = rand(Normal(), d)

## covariates
X = rand(Normal(), n, d)
X[:, 1] = ones(n)

## output
θ = sigmoid.(X * b) # true probabilities
y = rand.(Bernoulli.(θ))
mean((θ - y) .^ 2) # MSE with truth

## now drop bias from data
X = X[:, 2:d]

# set up neural network
using Flux 
using Flux.Tracker
using Flux: @epochs

model = Chain(Dense(d - 1, 5, sigmoid),
              Dense(5, 1, sigmoid))

# train model

## set up loss to minimize, optimizer, and data
loss(x, y) = Flux.mse(model(x), y)
opt = ADAM(params(model))
data = [(X', y)] # note the transpose of covariates!

loss(X', y) # 267.83

# now fit model
@epochs 100 Flux.train!(loss, data, opt)

# check model performance
loss(X', y) # 252.88

#2

I fixed the issue by minimizing crossentropy and using onehotbatch instead of of minimizing MSE.

# load packages
using Distributions
using Flux 
using Flux.Tracker
using Flux: onehotbatch, argmax, crossentropy, throttle, @epochs

# generate logistic regression data

## size of data
n = 1000
d = 10 # covariates including bias

## seed
srand(1)

## coefficients
b = rand(Normal(), d)

## covariates
X = rand(Normal(), n, d)
X[:, 1] = ones(n)

## output
θ = sigmoid.(X * b) # true probabilities
Y = onehotbatch(rand.(Bernoulli.(θ)), 0:1)

## now drop bias from data and transpose matrix
X = X[:, 2:d]
X = X'

# set up neural network

model = Chain(Dense(d - 1, 5, sigmoid),
              Dense(5, 2),
              softmax)

# train model

## set up loss to minimize, optimizer, and data
loss(x, y) = crossentropy(model(x), y)
opt = ADAM(params(model))
data = [(X, Y)] # note the transpose of covariates!

## accuracy
accuracy(x, y) = mean(argmax(model(x)) .== argmax(y))
accuracy(X, Y) # 32.5%

# now fit model
@epochs 1000 Flux.train!(loss, data, opt)

# check model performance
accuracy(X, Y) # 86.7%

#3

Circling back on this, I found that the issue was the dimension of the output… In my original post, X was of dimension d x n and y was dimension n x 1. As a side effect of changing y to one hot encoding, I corrected the dimension to 2 x n.