I’m trying to figure out how to load a vector of strings into a BallTree from NearestNeighbors.jl and I could really use some guidance. From what I understand, I can create my own Metric and use it with a BallTree so here’s what I have so far:
using Distances
using NearestNeighbors
# Create a new metric...and then what?
struct Levenshtein <: Metric end
# Function that computes the edit distance between two strings
function levenshtein(s::AbstractString, t::AbstractString)
n,m = length(s), length(t)
s,t = n < m ? (s,t) : (t,s)
n,m = n < m ? (n,m) : (m,n)
n == 0 && return m
m == 0 && return n
d = reshape(zeros(Int8, (m+1)*(n+1)), m+1, n+1)
d[:,1], d[1,:] = collect(0:m), collect(0:n)
for i in 1:n, j in 1:m
cost = s[i] == t[j] ? 0 : 1
d[j+1, i+1] = min(d[j,i+1]+1, d[j+1,i]+1, d[j,i]+cost)
end
return d[m+1, n+1]
end
Unfortunately, I’m already stumped . The function works just fine but I don’t know what the next step is to be able to wire this all together so that I can load a vector of strings into the tree, structure the tree according to the Levenshtein Metric, and then query the tree afterwords with new strings that aren’t in it.