# Knet.jl: Simple MLP for Iris Dataset

Hi everyone, I’m currently studying (I’m not done reading the docs yet) how Knet works. I’m trying to create a classification model for the iris dataset. That is, using the four features of the iris dataset as the predictors (`xtrn1`) and the species of the iris dataset as the label (`ytrn1`).

Here’s my code based from the Knet’s LeNet example on Github.

``````using Knet
using RDatasets

iris = dataset("datasets", "iris");
xtrn1 = Matrix(iris[:, 1:4]);
ytrn1 = iris[:, 5];
ytrn1 = map(x -> x == "setosa" ? 1 : x == "versicolor" ? 2 : 3, ytrn1);
dtrn1 = minibatch(Float32.(xtrn1'), ytrn1, 10);

# Define the Dense layer
struct Dense; w; b; f; end
Dense(i::Int, o::Int, f = relu) = Dense(param(o, i), param0(o), f) # constructor
(d::Dense)(x) = d.f.(d.w * mat(x) .+ d.b) # define method for dense layer

# Define Chain layer
struct Chain; layers; end
(c::Chain)(x) = (for l in c.layers; x = l(x); end; x) # define method for feed-forward
(c::Chain)(x, y) = nll(c(x), y, dims = 1) # define method for negative-log likelihood loss

# Define the Model
model = Chain((Dense(4, 10), Dense(10, 3), x -> softmax(x, dims = 1)))
adam!(model, repeat(dtrn1, 10)) # train the model
accuracy(model, dtrn1)
``````

So the `ytrn1` is an array of 1’s, 2’s, and 3’s corresponding to the species. I did not transform it into a one-hot-vector, since I notice the Knet’s LeNet example on Github uses labels as is (without translating to one-hot-vector). My questions is, is this how Knet works by design? Contrary to how Flux works, where we translate the response variable to a one-hot-vector for multiclass.

Further, I used `Float32.(xtrn1')` conversion because if I use `Float64.(xtrn1')` I get the following error:
`ERROR: Gradient type mismatch: w::Array{Float32,1} g::Array{Float64,1}`
I also want to understand why?

Lastly, the accuracy I got from this is, most of the time `0.3333333333333333`, sometimes `0.49333333333333335` or `0.66`. So I’m not sure if I specified the data, model and the training correctly.

Thanks you very much for any help.

3 Likes

6 Likes

@alasaadstat I am following your very informative blog post. One very small thing - perhaps can you change the font? I find the font very ‘thin’.
Someone here will point out that I can change this in my browser (Chrome).

2 Likes

There is not a right way for this – I have used both styles in the past. I currently prefer the integer labels rather than one-hot-vectors as they are faster and take up less space.

Julia is very picky about types, you do not want to mix Float32 and Float64 in your parameters/data. Pick one and consistently use it.

I haven’t played with the code but one problem seems to be using softmax as the last layer. You do not need this as the loss function for the chain uses `nll` (negative log likelihood) which performs the softmax function. Try removing the last layer of your chain.

3 Likes

Hi @johnh, thank you for pointing out. I have updated the font already.

Hi Sir,

Thank you very much for the explanations, noted on the above points.

Agree on this, while the docs clearly mentioned the normalization of the `nll`, I only realized it after summing up the results of the `nll` (which is 1). So I removed the `softmax` already in my blogpost.