Is the package supposed to work on Julia v1? I see a commit about support for v0.7 but I get an error:
julia> movies[1,:]
9-element Array{Any,1}:
"Moonlight (2016)"
0
0
0
1
0
0
0
0
julia> nemo
9-element Array{Any,1}:
"Finding Nemo (2003)"
0
1
1
0
1
0
0
0
julia> hamming(movies[1,:], nemo)
ERROR: MethodError: no method matching one(::Type{Any})
Closest candidates are:
one(::Type{Union{Missing, T}}) where T at missing.jl:83
one(::Missing) at missing.jl:79
one(::BitArray{2}) at bitarray.jl:392
...
Stacktrace:
[1] one(::Type{Any}) at ./missing.jl:83
[2] result_type(::Hamming, ::Array{Any,1}, ::Array{Any,1}) at /Users/adrian/.julia/packages/Distances/nLAdT/src/metrics.jl:194
[3] eval_start(::Hamming, ::Array{Any,1}, ::Array{Any,1}) at /Users/adrian/.julia/packages/Distances/nLAdT/src/metrics.jl:196
[4] evaluate at /Users/adrian/.julia/packages/Distances/nLAdT/src/metrics.jl:159 [inlined]
[5] hamming(::Array{Any,1}, ::Array{Any,1}) at /Users/adrian/.julia/packages/Distances/nLAdT/src/metrics.jl:240
[6] top-level scope at none:0
Distances.jl is working fine on 1.0. I think this is simply a bug; you can work around it by
julia> Distances.result_type(::Hamming, ::AbstractArray{T1,N} where {T1,N}, ::AbstractArray{T2,N} where {T2,N})=Int
In reality, Hamming probably wants a separate implementation anyway. Desired handling of NaN
and missing
is not entirely obvious though.
julia> using Random, BenchmarkTools, Distances
julia> a=bitrand(10^5); b = bitrand(10^5);
julia> @btime evaluate($Hamming(), $a, $b);
154.316 μs (1 allocation: 16 bytes)
julia> _fhamming(a,b) = count(a.==b);
julia> @btime _fhamming($a, $b);
2.351 μs (2 allocations: 12.41 KiB)
julia> _fhamming(a,b) = count(isequal.(a,b));
julia> @btime _fhamming($a, $b);
328.005 μs (3 allocations: 16.59 KiB)
Thank you, this pointed me in the right direction. It doesn’t like the types of the arrays (it used to work fine in 0.6).
Ex:
julia> hamming(x,y)
ERROR: MethodError: no method matching one(::Type{Any})
Closest candidates are:
one(::Type{Union{Missing, T}}) where T at missing.jl:83
one(::Missing) at missing.jl:79
one(::BitArray{2}) at bitarray.jl:392
...
Stacktrace:
[1] one(::Type{Any}) at ./missing.jl:83
[2] result_type(::Hamming, ::Array{Any,1}, ::Array{Any,1}) at /Users/adrian/.julia/dev/Distances/src/metrics.jl:194
[3] eval_start(::Hamming, ::Array{Any,1}, ::Array{Any,1}) at /Users/adrian/.julia/dev/Distances/src/metrics.jl:196
[4] evaluate at /Users/adrian/.julia/dev/Distances/src/metrics.jl:159 [inlined]
[5] hamming(::Array{Any,1}, ::Array{Any,1}) at /Users/adrian/.julia/dev/Distances/src/metrics.jl:240
[6] top-level scope at none:0
julia> hamming(Int[x...],Int[y...])
4
But really, the above is not a reliable fix.
I’m not sure about the intended semantics for missing
and NaN
, but you the above fix will run into trouble with missing values. And the bitarray variant (count broadcasted ==
) is also bad; the real solution is to implement a specialization for bitarray
(because that’s the main use for hamming distance).
Can you open an issue for this?
I’m afraid I don’t understand the internals so feel free to expand on this if necessary:
https://github.com/JuliaStats/Distances.jl/issues/114