Hi,
Further to my questions on the General slack today I decided to do an experiment, pitting an NTuple based kmer type, against the existing BioSequences Mer types which are based on primitive types.
Valentin advised me to use ntuple to generate “tail” in the code, and to use ntuple in a way that it aggressively specialised.
Anyway I did and the results are here: https://gist.github.com/BenJWard/4e06c5c4f4648c594fdb1a886cf5042d
When I benchmarked I found v. bad performance:
julia> @benchmark DNAKmer{63,2}($dnaseq)
BenchmarkTools.Trial:
memory estimate: 1.36 KiB
allocs estimate: 87
--------------
minimum time: 24.255 μs (0.00% GC)
median time: 24.460 μs (0.00% GC)
mean time: 25.328 μs (0.00% GC)
maximum time: 307.687 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
Vs the primitive type:
julia> @benchmark BigDNAMer{63}($dnaseq)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 126.889 ns (0.00% GC)
median time: 126.991 ns (0.00% GC)
mean time: 130.191 ns (0.00% GC)
maximum time: 303.371 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 883
But looking at the code warntype of the function I could see the variable idx was boxed, and the tuple “tail” had an element type of any.
I wasnt sure why the boxin occured, but changing idx to a Ref fixed the issue, and it beats the primitive type performance!
julia> @benchmark DNAKmer{63,2}($dnaseq)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 91.087 ns (0.00% GC)
median time: 91.181 ns (0.00% GC)
mean time: 92.795 ns (0.00% GC)
maximum time: 243.353 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 952
On Julia 1.5
Why did the boxing occur, and why was it Ref fixed it - in both cases the idx is just an int.