Hi,
I’ve been trying to find where the memory allocations are in my code if possible. I have this function in a package, which I don’t expect to allocate, because I avoid arrays, and use immutable types:
const CDN = Union{BioSequences.DNACodon, BioSequences.RNACodon}
const DEFAULT_TRANS = BioSequences.ncbi_trans_table[1]
const CDN_POS_MASKS = (0xFFFFFFFFFFFFFFFC, 0xFFFFFFFFFFFFFFF3, 0xFFFFFFFFFFFFFFCF)
function expected_NG86(codon::C, k::Float64, code::GeneticCode) where {C <: CDN}
cdn_bits = UInt64(codon)
aa = code[codon]
S = N = 0.0
for (pos, msk) in enumerate(CDN_POS_MASKS)
@inbounds for base in 0:3
# Create the neighbor codon.
ins = base << (2 * (pos - 1))
neighbor = C((cdn_bits & msk) | ins)
if codon == neighbor # Codon created is not a neighbor: should happen 3 times.
continue
end
# See if the mutation is transition or transversion.
cdn_purine = ispurine(codon[pos])
neighbor_purine = ispurine(neighbor[pos])
istransition = (cdn_purine && neighbor_purine) || (!cdn_purine && !neighbor_purine)
# See if the protein changes between codon and neighbor, and update
# N and S counts accordingly.
inc = ifelse(istransition, 1.0, k)
neighbor_aa = code[neighbor]
if neighbor_aa == AA_Term
N += inc
elseif neighbor_aa == aa
S += inc
else
N += inc
end
end
end
normalization = (N + S) / 3
return (S / normalization), (N / normalization)
end
cdn = DNACodon(0x000000000000000) # a test codon to benchmark and look at allocation.
Both DNACodon and RNACodon are primitive 64 bits types. Here’s the full type definition:
primitive type Kmer{T<:NucleicAcid, K} <: Sequence 64 end
const DNAKmer{K} = Kmer{DNA, K}
const RNAKmer{K} = Kmer{RNA, K}
const DNACodon = DNAKmer{3}
const RNACodon = RNAKmer{3}
If I do benchmark tools I can see it is very fast but I see 3 allocations:
julia> @benchmark GeneticVariation.expected_NG86($cdn, $1.0, $GeneticVariation.DEFAULT_TRANS)
BenchmarkTools.Trial:
memory estimate: 64 bytes
allocs estimate: 3
--------------
minimum time: 71.827 ns (0.00% GC)
median time: 74.665 ns (0.00% GC)
mean time: 83.535 ns (4.07% GC)
maximum time: 1.955 μs (92.34% GC)
--------------
samples: 10000
evals/sample: 972
Given this function might be used on 1000 such RNACodons or DNACodons, that’s 3000 allocations when as far as I can tell, I shouldn’t need to allocate anything in my code - it should all be immutable variables on the stack.
So I tracked allocation and produced a mem file:
- function expected_NG86(codon::C, k::Float64, code::GeneticCode) where {C <: CDN}
101471 cdn_bits = UInt64(codon)
0 aa = code[codon]
0 S = N = 0.0
0 for (pos, msk) in enumerate(CDN_POS_MASKS)
0 @inbounds for base in 0:3
- # Create the neighbor codon.
0 ins = base << (2 * (pos - 1))
0 neighbor = C((cdn_bits & msk) | ins)
0 if codon == neighbor # Codon created is not a neighbor: should happen 3 times.
0 continue
- end
- # See if the mutation is transition or transversion.
0 cdn_purine = ispurine(codon[pos])
0 neighbor_purine = ispurine(neighbor[pos])
0 istransition = (cdn_purine && neighbor_purine) || (!cdn_purine && !neighbor_purine)
- # See if the protein changes between codon and neighbor, and update
- # N and S counts accordingly.
0 inc = ifelse(istransition, 1.0, k)
0 neighbor_aa = code[neighbor]
0 if neighbor_aa == AA_Term
0 N += inc
0 elseif neighbor_aa == aa
0 S += inc
- else
0 N += inc
- end
- end
- end
0 normalization = (N + S) / 3
0 return (S / normalization), (N / normalization)
- end
So no allocations, except for the first line. Where I convert RNACodon or DNACodon types to UInt64. This falls back to the convert method, which is defined as:
Base.convert(::Type{UInt64}, x::DNAKmer{K}) where {K} = reinterpret(UInt64, x)
Base.convert(::Type{UInt64}, x::RNAKmer{K}) where {K} = reinterpret(UInt64, x)
So all I’m doing is reinterpreting one 64 bit bitstype as a UInt64, and yet I see such a memory allocation? I’m not sure why this is, as if I do:
julia> @benchmark UInt64($cdn)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.598 ns (0.00% GC)
median time: 1.898 ns (0.00% GC)
mean time: 1.911 ns (0.00% GC)
maximum time: 22.747 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
0 allocations reported, or if I do:
julia> @time UInt64(cdn)
0.000004 seconds (4 allocations: 160 bytes)
0x0000000000000000
It tells me 4 allocations.
So I’m baffled why this one line in my otherwise non allocating function, which simply reinterprets a 64-bit value appears to be allocating according to the .mem.
This function used to use lots of short lived small arrays and did an awful amount of allocation that was really slowing it down, so I’m pleased I got it this far, I’m just trying to figure out why there’s still the allocation that remains!