I am writing a KNN prediction algorithm from scratch and I wanted to see if I can squeeze even more performance out of with LoopVectorization’s @turbo macros. However when trying to compile the method, it seems that initializing the number of rows to iterate through using size(::Matrix, 2) is not meshing with the macro. Below is the code:

# Simple struct
struct KNN
k::Int
M::Matrix
Labels
end
# Prealloc
const prediction = Array{Float16}(undef, size(test_set, 1))
function predicts(model::KNN, test::Matrix)
#println(size(model.M, 2), size(test, 2))
size(model.M, 2) == size(test, 2) || throw("Dimensions do not match, test set must contain the same variables.")
# For each row in test Matrix
# compute distance with every member of the model's Matrix
# sort based on distance
# Look at classes and use mode to vote on majority
# Assign prediction to Vector
# Repeat
# Rotating for column-wise operations
r1 = rotl90( model.M ) # Training data
r2 = rotl90( test ) # Test data
nrows = size( r2, 2) # Number of test rows <------ Incompatible?
orows = size( r1, 2) # Number of train rows <------ Incompatible?
K = model.k
# IDs
id = collect(1:orows)
#Prealloc distance array
dist = Array{Float32}(undef, orows)
@turbo for i in 1:nrows
curr_row = r2[:, i]
# Compute distance between the current row (internally a column) and the training set
for j in 1:orows
dist[j]= 1/distance(curr_row, r1[:, j]; p = 2)^2
end
# Store and sort
df = DataFrame( distances= dist, id = id)
sort!(df, :distances, rev=true)[1:K, :]
# Vote
prediction[i] = mode( model.Labels[ df.id ] )
end
prediction
end

Here is the error message:

Closes function definition for predicts(model::KNN, test::Matrix)

ERROR: LoadError: Expression not recognized. (Expr(:parameters, :((Expr(:kw, :p, 2)))))
in expression starting at c:\Users\aledo\Desktop\Julia_Files\dataf.jl:92

Where line 92 is the outer loop going from i = 1:nrows .

It’s not 100% critical that it works with @turbo as I am already getting good performance using Theads.@threads or @fastmath.

Float16 is also probably going to be generally suboptimal for performance at the moment.

There are a lot of things you can already do to improve performance, e.g.

function predicts(model::KNN, test::Matrix)
#println(size(model.M, 2), size(test, 2))
size(model.M, 2) == size(test, 2) || throw("Dimensions do not match, test set must contain the same variables.")
# For each row in test Matrix
# compute distance with every member of the model's Matrix
# sort based on distance
# Look at classes and use mode to vote on majority
# Assign prediction to Vector
# Repeat
# Rotating for column-wise operations
r1 = rotl90( model.M ) # Training data
r2 = rotl90( test ) # Test data
nrows = size( r2, 2) # Number of test rows <------ Incompatible?
orows = size( r1, 2) # Number of train rows <------ Incompatible?
K = model.k
# IDs
id = collect(1:orows)
#Prealloc distance array
dist = Array{Float32}(undef, orows)
prediction = Array{Float32}(undef, size(test, 1))
Threads.@threads for i in 1:nrows
curr_row = @view r2[:, i]
# Compute distance between the current row (internally a column) and the training set
for j in 1:orows
dist[j]= 1/distance(curr_row, @view(r1[:, j]); p = 2)^2
end
# Store and sort
sp = @view(sortperm(dist)[1:K])
# Vote
prediction[i] = mode( model.Labels[ @view(id[sp]) ] )
end
prediction
end

should already help a lot.

If you want @turbo to work, you’ll need lower level code. E.g., replace the call to distance with the actual loops calculating distance, and probably move the @turbo to the for j in 1:orows loop.

This halved the time by 2x, now down to about 36 seconds, thank you!

As for @turbo, placing it in front j in 1:orow now throws this error: nested task error: UndefVarError: j not defined

I assume it’s got something to do with the @threads.

Updated code:

Threads.@threads for i in 1:nrows
curr_row = @view r2[:, i]
# Compute distance between the current row (internally a column) and the training set
#j=0
@turbo for j = 1:orows
dist[j] = 1/sum(sqrt.( (curr_row .- @view(r1[:, j]) ).^2))^2
end
# Store and sort
sp = @view(sortperm(dist, rev= model.weighted )[1:K])
# Vote
prediction[i] = mode( model.Labels[ @view(id[sp]) ] )
end

Incredible! The speed is about 20-24 seconds now. So as I understand it, when we use @turbo we have to write granular code and? Also by transposing r1 and r2 do you mean keeping them in their original form?

Is this the wrong approach? I thought that Julia performs better when we iterate through columns rather than rows. Thus by rotating the rows we can access them faster now that they are columns.