This question originates from a discussion in Use relative transformations by helgee · Pull Request #48 · JuliaAstro/AstroTime.jl · GitHub
The AstroTime library provides a way to store and operate on Epochs in different timescales.
Each timescale has its own type: MWE with 2 timescales TT and TDB:
abstract type TimeScale end
struct ConcreteTDBScale <: TimeScale end
const TDB = ConcreteTDBScale()
struct ConcreteTTScale <: TimeScale end
const TT = ConcreteTTScale()
struct CEpoch{S<:TimeScale}
scale::S
second::Int64
fraction::Float64
end
const k = 1.657e-3
const eb = 1.671e-2
const m₀ = 6.239996
const m₁ = 1.99096871e-7
function getoffset(::ConcreteTTScale, ::ConcreteTDBScale, ep::CEpoch{ConcreteTTScale})
tt = ep.fraction + ep.second
g = m₀ + m₁ * tt
return k * sin(g + eb * sin(g))
end
TT and TDB are the only instances of their respective timescale types (because empty struct are singletons).
Also CEpoch{ConcreteTTScale} and CEpoch{ConcreteTDBScale}
are 128 bits bitstype because no space is reserved for the empty struct.
Another possibility would be to not instantiate the timescales but use only abstract types: MWE:
abstract type TimeScale end
abstract type TDBScale <: TimeScale end
abstract type TTScale <: TimeScale end
struct AEpoch{S<:TimeScale}
second::Int64
fraction::Float64
end
const k = 1.657e-3
const eb = 1.671e-2
const m₀ = 6.239996
const m₁ = 1.99096871e-7
function getoffset(::Type{TTScale}, ::Type{TDBScale}, ep::AEpoch{TTScale})
tt = ep.fraction + ep.second
g = m₀ + m₁ * tt
return k * sin(g + eb * sin(g))
end
The memory layout of AEpoch{TTScale} is the same as as the one of CEpoch{ConcreteTTScale}.
But benchmarking the getoffset functions:
aep = AEpoch{TTScale}(0, 0.0)
@benchmark getoffset($(Ref(TTScale))[], $(Ref(TDBScale))[], $(Ref(aep))[])
cep = CEpoch(ConcreteTTScale(), 0, 0.0)
@benchmark @benchmark getoffset($(Ref(TT))[], $(Ref(TDB))[], $(Ref(cep))[])
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 52.585 ns (0.00% GC)
median time: 53.596 ns (0.00% GC)
mean time: 54.543 ns (0.00% GC)
maximum time: 163.054 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 986
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 37.165 ns (0.00% GC)
median time: 37.188 ns (0.00% GC)
mean time: 37.741 ns (0.00% GC)
maximum time: 107.502 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 992
it seems using the concrete type unique instance is faster. This is probably due to dispatch but I wanted to check the llvm/assembly of the compiled function to see if they were the same and I realised the only difference is that in the case of the concrete timescale instance, the epoch argument is passed via argument 1 (RDI register in the native Linux x64 calling convention) and in the other case it is argument 3 (RDX register in Linux x64 calling convention).
So I have the following questions:
-
Is there a reason to prefer a particular implementation over the other? For example, without considering the benchmark, the only advantage I see of the implementation where the timescale instance is stored in the Epoch struct is that it could also support some user defined timescale which would need to store extra data.
-
why is there a difference in the LLVM calling sequence? In both cases, getoffset does not need the argument 1 and 2 (they are only used for dispatch) but they are optimized out only in one case.
Thank you