Ntuple aggressive specialisation and boxed values

Ward9250 · May 30, 2020, 1:36am

Hi,

Further to my questions on the General slack today I decided to do an experiment, pitting an NTuple based kmer type, against the existing BioSequences Mer types which are based on primitive types.

Valentin advised me to use ntuple to generate “tail” in the code, and to use ntuple in a way that it aggressively specialised.

Anyway I did and the results are here: https://gist.github.com/BenJWard/4e06c5c4f4648c594fdb1a886cf5042d

When I benchmarked I found v. bad performance:

julia> @benchmark DNAKmer{63,2}($dnaseq)
BenchmarkTools.Trial: 
  memory estimate:  1.36 KiB
  allocs estimate:  87
  --------------
  minimum time:     24.255 μs (0.00% GC)
  median time:      24.460 μs (0.00% GC)
  mean time:        25.328 μs (0.00% GC)
  maximum time:     307.687 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Vs the primitive type:

julia> @benchmark BigDNAMer{63}($dnaseq)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     126.889 ns (0.00% GC)
  median time:      126.991 ns (0.00% GC)
  mean time:        130.191 ns (0.00% GC)
  maximum time:     303.371 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     883

But looking at the code warntype of the function I could see the variable idx was boxed, and the tuple “tail” had an element type of any.

I wasnt sure why the boxin occured, but changing idx to a Ref fixed the issue, and it beats the primitive type performance!

julia> @benchmark DNAKmer{63,2}($dnaseq)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     91.087 ns (0.00% GC)
  median time:      91.181 ns (0.00% GC)
  mean time:        92.795 ns (0.00% GC)
  maximum time:     243.353 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     952

On Julia 1.5

Why did the boxing occur, and why was it Ref fixed it - in both cases the idx is just an int.

Ward9250 · May 30, 2020, 3:29pm

@inline shiftright(x::BigDNAMer{K}) where {K} = BigDNAMer{K}(reinterpret(UInt128, x) >> 2)

@inline function shiftright(x::Kmer{A,K,N}) where {A,K,N}
    head = @inbounds x.data[1] >> 2
    tail = ntuple(Val{N - 1}()) do i
        Base.@_inline_meta
        j = i + 1
        @inbounds begin
            return (x.data[j] >> 2) | ((x.data[i] & UInt64(3)) << 62)
        end
    end
    return Kmer{A,K,N}((head, tail...))
end

@benchmark shiftright($m)
@benchmark shiftright($oldm)

I tried to implement a good rightshift of all the nucleotides stored in a ntuple based Kmer.
Performance is almost as good a the UInt128 based kmer oldm. But not quite. Any way I could get it even quicker?

Ward9250 · May 30, 2020, 3:47pm

Turns out you can get faster:

    return _shiftright(zero(UInt64), x.data...)
end

@inline function _shiftright(carry::UInt64, head::UInt64, tail...)
    Base.@_inline_meta
    return ((head >> 2) | carry, _shiftright((head & UInt64(3)) << 62, tail...)...)
end

@inline _shiftright(carry::UInt64) = ()

simeonschaub · May 30, 2020, 4:00pm

The Base.@_inline_meta should be redundant here, since it is equivalent to the @inline already in front of the function definition.

Ward9250 · May 30, 2020, 5:28pm

So I tried to introduce a similar shiftleft function:

@inline function shiftleft(x::Kmer{A,K,N}) where {A,K,N}
    _, newbits = _shiftleft(x.data...)
    return Kmer{A,K,N}(_cliphead(64N - 2K, newbits...))
end

@inline function _cliphead(by::Integer, head::UInt64, tail...)
    return (head & (typemax(UInt64) >> by), tail...)
end

@inline function _shiftleft(head::UInt64, tail...)
    carry, newtail = _shiftleft(tail...)
    return head >> 62, ((head << 2) | carry, newtail...)
end

@inline _shiftleft(head::UInt64) = (head & 0xC000000000000000) >> 62, head << 2

Which does similar to shiftright, but with clipping of the tuple’s head element at the end.

Benchmarking this I get allocations and slower times:

julia> @benchmark shiftleft($m)
BenchmarkTools.Trial: 
  memory estimate:  112 bytes
  allocs estimate:  6
  --------------
  minimum time:     227.321 ns (0.00% GC)
  median time:      229.827 ns (0.00% GC)
  mean time:        241.572 ns (1.41% GC)
  maximum time:     3.747 μs (93.68% GC)
  --------------
  samples:          10000
  evals/sample:     442

julia> @code_warntype shiftleft(m)
Variables
  #self#::Core.Compiler.Const(shiftleft, false)
  x::Kmer{DNAAlphabet{2},63,2}
  @_3::Int64
  newbits::Tuple{UInt64,UInt64}

Body::Kmer{DNAAlphabet{2},63,2}
1 ─       nothing
│   %2  = Base.getproperty(x, :data)::Tuple{UInt64,UInt64}
│   %3  = Core._apply_iterate(Base.iterate, Main._shiftleft, %2)::Tuple{UInt64,Tuple{UInt64,UInt64}}
│   %4  = Base.indexed_iterate(%3, 1)::Core.Compiler.PartialStruct(Tuple{UInt64,Int64}, Any[UInt64, Core.Compiler.Const(2, false)])
│         Core.getfield(%4, 1)
│         (@_3 = Core.getfield(%4, 2))
│   %7  = Base.indexed_iterate(%3, 2, @_3::Core.Compiler.Const(2, false))::Core.Compiler.PartialStruct(Tuple{Tuple{UInt64,UInt64},Int64}, Any[Tuple{UInt64,UInt64}, Core.Compiler.Const(3, false)])
│         (newbits = Core.getfield(%7, 1))
│   %9  = Core.apply_type(Main.Kmer, $(Expr(:static_parameter, 1)), $(Expr(:static_parameter, 2)), $(Expr(:static_parameter, 3)))::Core.Compiler.Const(Kmer{DNAAlphabet{2},63,2}, false)
│   %10 = (%9)(newbits)::Kmer{DNAAlphabet{2},63,2}
└──       return %10

Topic		Replies	Views
Using large NTuples makes Julia hang General Usage hdf5 , memory-allocation , data_structures , ntuple	23	1598	February 10, 2023
[ANN] NTupleKmers.jl Biology, Health, and Medicine announcement , ntuple	0	691	May 30, 2020
Performant way to create ntuple? General Usage ntuple	2	1869	September 8, 2017
Core.tuple() warntype Performance	22	1467	January 17, 2020
Type unstable function returning a tuple General Usage	34	1334	November 3, 2020

Ntuple aggressive specialisation and boxed values

Related topics