I have a big dataset (250k rows x 30 columns) with which I would like to train a neural network for a classification task (it should classify one of two possible classes). I’ve built this model using Python’s MLPClassifier and I got an accuracy of around 83%: not amazing but it shows that it somewhat works. I’ve then tried to replicate this in Julia using Flux.jl. Here is my code:
using Flux, DataFrames, DataFramesMeta, CSV using Chain: @chain using StatsBase: standardize, ZScoreTransform using MLDataUtils: splitobs, shuffleobs using IterTools: ncycle function build_model(input, layers, output; activation = relu) f =  in_layer = input for out_layer in layers append!(f, [Dense(in_layer, out_layer, activation)]) in_layer = out_layer end append!(f, [Dense(in_layer, output)]) append!(f, [softmax]) Chain(f...) end filename = raw"E:\Università\2020-2021\Applicazioni di Machine Learning\atlas_data.csv" df, labels = @chain begin CSV.read(filename, DataFrame) @where(_, :KaggleSet .== "t") # this is just to select a subset of the dataset select(_, Not([:Weight, :EventId, :KaggleSet, :KaggleWeight])) # these are columns to ignore select(_, Not(:Label)), @chain _ begin select(_, :Label) Flux.onehotbatch(_.Label, ["s", "b"]) # "s" and "b" are the labels for the classes end end N_input = length(names(df)) N_output = size(labels, 1) X = transpose(standardize(ZScoreTransform, Matrix(df))) X_train, X_test = splitobs(shuffleobs(X), at = 0.7) y_train, y_test = splitobs(shuffleobs(labels), at = 0.7) model = build_model(N, [20, 10, 2], N_output) loss(a, b) = Flux.Losses.mse(model(a), b) ps = Flux.params(model) opt = ADAM(1e-3, (0.9, 0.999)) batchsize = 200 n_epochs = 200 loader = Flux.Data.DataLoader( (X_train, y_train), batchsize = batchsize, shuffle = true ) Flux.@epochs n_epochs begin Flux.train!(loss, ps, loader, opt) println(loss(X_train, y_train)) end
I have used basically the same parameters, the only difference are the cost function (but I’ve also tried with Flux’s crossentropy which should be the one sklearn uses in MLPClassifier) and the output layer (on Python I’ve used only one neuron as output, but in Julia I’m using two so I can use
onehotbatch, and it should also be more correct). The rest is pretty much identical, but I’ve already tinkered with various models and parameters.
Here is the problem: the loss function (which I print every epoch) gets to a stable value immediately (in two or three iterations) and remains like this. If I stop the program and call
model(X_train) I’ve noticed that every datapoint is mapped to basically the same values of the two output neurons, which sometimes are ~0.3 for a class and ~0.6 for another, while other times (I believe changing loss function does this) one class as a value of 1.0 and the other is basically 0.0 (again, for every datapoint, as if every single entry of my dataset belonged to a single class).
I know this may not be strictly a Julia related question, but since I’ve tried with the same exact dataset on Python getting an accuracy on ~83%, I guess that the problem is not the (theoretical) model, but the way I have implemented it in Flux.
Note that the dataset manipulation is not the problem. I’ve ignored the same exact columns in Python, and the subset selected is the same. In fact, the dataframe in Julia has 250k rows just like the one in Python at the end of the mmanipulations. The problem relies on what I did after, in the model implementation.
Can you please help me? Thank you