Type Stability Help

I am following along with Make Your Own Neural Network by Tariq Rashid. In this work you implement a small neural network by hand. This is for my own benefit and I know if I was doing something for real I should just Flux.jl etc. I was able to replicate makeyourownneuralnetwork/part3_neural_network_mnist_backquery.ipynb at master · makeyourownneuralnetwork/makeyourownneuralnetwork · GitHub in julia. However, I get only a modest performance improvement (2X) over python. I think it has someting to do with the type stability of my code. can someone take a look at the train!() function and help me understand how to make it type stable?

here is the MWE with dummy data.

# %%

using Random
using LogExpFunctions

# %%

rng = MersenneTwister(1234)

struct my_nn
    in_nodes::Int
    h_nodes::Int
    out_nodes::Int
    lr::Float32
    wih::Array{Float32}
    who::Array{Float32}
end

@views function train!(NN::my_nn, training_data, epochs)
    label = zeros(Float32, NN.out_nodes)
    num_data = size(training_data, 1)
    hidden_inputs = zeros(Float32, NN.h_nodes)
    hidden_outputs = zeros(Float32, NN.h_nodes)
    final_inputs = zeros(Float32, NN.out_nodes)
    final_outputs = zeros(Float32, NN.out_nodes)
    output_errors = zeros(Float32, NN.out_nodes)
    hidden_errors = zeros(Float32, NN.h_nodes)
    for e in 1:epochs
        for nd in 1:num_data
            # setup data
            inputs = training_data[nd, 2:end]
            label .= 0.01f0
            label[Int(training_data[nd, 1] + 1)] = 0.99f0

            # train

            # calculate signals into hidden layer
            hidden_inputs .= NN.wih * inputs
            # calculate the signals emerging from hidden layer
            hidden_outputs .= logistic.(hidden_inputs)

            # calculate signals into final output layer
            final_inputs .= NN.who * hidden_outputs
            # calculate the signals emerging from final output layer
            final_outputs .= logistic.(final_inputs)

            # output layer error is the (target - actual)
            output_errors .= label .- final_outputs
            # hidden layer error is the output_errors, split by weights, recombined at hidden nodes
            hidden_errors .= NN.who' * output_errors

            # update the weights for the links between the hidden and output layers
            NN.who .+= NN.lr .* (output_errors .* final_outputs .* (1.0f0 .- final_outputs)) * hidden_outputs'

            # update the weights for the links between the input and hidden layers
            NN.wih .+= NN.lr .* (hidden_errors .* hidden_outputs .* (1.0f0 .- hidden_outputs)) * inputs'
        end
    end
end


# %%

# number of input, hidden and output nodes
input_nodes = 784
hidden_nodes = 200
output_nodes = 10

# learning rate
learning_rate = 0.1f0

wih = randn(rng, Float32, (hidden_nodes, input_nodes)) .* input_nodes^-0.5f0
who = randn(rng, Float32, (output_nodes, hidden_nodes)) .* hidden_nodes^-0.5f0

nn = my_nn(input_nodes, hidden_nodes, output_nodes, learning_rate, wih, who)

# %% dummy data

training_data = zeros(Float32, (60000, 785))

# %%
epochs = 5
@code_warntype train!(nn, training_data, epochs)

Thank you so much for your help!

Array{Float32} isn’t a concrete type. You looking at your code, you probably meant Matrix{Float32}. Array{T,N} is an N dimensional array so leaving the N off makes it an abstract type.

Thanks so much. that was indeed the issue with the type stability.

Now on to the next stage of optimization. with @time I get the following:

(2.40 M allocations: 178.327 GiB, 2.90% gc time)

Any ideas on how to reduce allocations? that seem much higher than it should be.

thank you!

Getting rid of the type instability actually made the code about 25% slower…

The next steps are replacing C .= A*B with mul!(C, A, B) which will remove the allocations. Also, If you have an Intel or AMD cpu, you should probably add the MKL package which speeds up matrix multiplication a decent amount.

There seems to be a lot going on here that is hard to navigate without a lot of experience.

After removing the type instability and drastically decreasing allocations (2.40 M allocations: 178.327 GiB, 2.90% gc time) -> (1.50 M allocations: 9.128 GiB, 0.28% gc time the performance is now 50% of what it was originally. About the same speed as the python implementation. Not sure what I’m missing here.

Again thank you so much for your help.

Here is my function after the changes for reference.

@views function train!(NN::my_nn, training_data, epochs)
    label = zeros(Float32, NN.out_nodes)
    num_data = size(training_data, 1)
    hidden_inputs = zeros(Float32, NN.h_nodes)
    hidden_outputs = zeros(Float32, NN.h_nodes)
    final_inputs = zeros(Float32, NN.out_nodes)
    final_outputs = zeros(Float32, NN.out_nodes)
    output_errors = zeros(Float32, NN.out_nodes)
    hidden_errors = zeros(Float32, NN.h_nodes)
    for e in 1:epochs
        for nd in 1:num_data
            # setup data
            inputs = training_data[nd, 2:end]
            label .= 0.01f0
            label[Int(training_data[nd, 1] + 1)] = 0.99f0

            # train

            # calculate signals into hidden layer
            # hidden_inputs .= NN.wih * inputs
            mul!(hidden_inputs, NN.wih, inputs)
            # calculate the signals emerging from hidden layer
            hidden_outputs .= logistic.(hidden_inputs)

            # calculate signals into final output layer
            # final_inputs .= NN.who * hidden_outputs
            mul!(final_inputs, NN.who, hidden_outputs)
            # calculate the signals emerging from final output layer
            final_outputs .= logistic.(final_inputs)

            # output layer error is the (target - actual)
            output_errors .= label .- final_outputs
            # hidden layer error is the output_errors, split by weights, recombined at hidden nodes
            # hidden_errors .= NN.who' * output_errors
            mul!(hidden_errors, NN.who', output_errors)

            # update the weights for the links between the hidden and output layers
            # NN.who .+= NN.lr .* (output_errors .* final_outputs .* (1.0f0 .- final_outputs)) * hidden_outputs'
            mul!(NN.who, (output_errors .* final_outputs .* (1.0f0 .- final_outputs)), hidden_outputs', NN.lr, 1.0f0)

            # update the weights for the links between the input and hidden layers
            # NN.wih .+= NN.lr .* (hidden_errors .* hidden_outputs .* (1.0f0 .- hidden_outputs)) * inputs'
            mul!(NN.wih, (hidden_errors .* hidden_outputs .* (1.0f0 .- hidden_outputs)), inputs', NN.lr, 1.0f0)
        end
    end
end