Abstract type versus singleton struct

This question originates from a discussion in https://github.com/JuliaAstro/AstroTime.jl/pull/48

The AstroTime library provides a way to store and operate on Epochs in different timescales.

Each timescale has its own type: MWE with 2 timescales TT and TDB:

abstract type TimeScale end

struct ConcreteTDBScale <: TimeScale end
const TDB = ConcreteTDBScale()
struct ConcreteTTScale <: TimeScale end
const TT = ConcreteTTScale()

struct CEpoch{S<:TimeScale}
    scale::S
    second::Int64
    fraction::Float64
end

const k = 1.657e-3
const eb = 1.671e-2
const m₀ = 6.239996
const m₁ = 1.99096871e-7

function getoffset(::ConcreteTTScale, ::ConcreteTDBScale, ep::CEpoch{ConcreteTTScale})
    tt = ep.fraction + ep.second
    g = m₀ + m₁ * tt
    return k * sin(g + eb * sin(g))
end

TT and TDB are the only instances of their respective timescale types (because empty struct are singletons).

Also CEpoch{ConcreteTTScale} and CEpoch{ConcreteTDBScale}
are 128 bits bitstype because no space is reserved for the empty struct.

Another possibility would be to not instantiate the timescales but use only abstract types: MWE:


abstract type TimeScale end
abstract type TDBScale <: TimeScale end
abstract type TTScale <: TimeScale end

struct AEpoch{S<:TimeScale}
    second::Int64
    fraction::Float64
end

const k = 1.657e-3
const eb = 1.671e-2
const m₀ = 6.239996
const m₁ = 1.99096871e-7

function getoffset(::Type{TTScale}, ::Type{TDBScale}, ep::AEpoch{TTScale})
    tt = ep.fraction + ep.second
    g = m₀ + m₁ * tt
    return k * sin(g + eb * sin(g))
end

The memory layout of AEpoch{TTScale} is the same as as the one of CEpoch{ConcreteTTScale}.

But benchmarking the getoffset functions:

aep = AEpoch{TTScale}(0, 0.0)
@benchmark  getoffset($(Ref(TTScale))[], $(Ref(TDBScale))[], $(Ref(aep))[])

cep = CEpoch(ConcreteTTScale(), 0, 0.0)
@benchmark @benchmark getoffset($(Ref(TT))[], $(Ref(TDB))[], $(Ref(cep))[])
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     52.585 ns (0.00% GC)
  median time:      53.596 ns (0.00% GC)
  mean time:        54.543 ns (0.00% GC)
  maximum time:     163.054 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     986


BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     37.165 ns (0.00% GC)
  median time:      37.188 ns (0.00% GC)
  mean time:        37.741 ns (0.00% GC)
  maximum time:     107.502 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     992

it seems using the concrete type unique instance is faster. This is probably due to dispatch but I wanted to check the llvm/assembly of the compiled function to see if they were the same and I realised the only difference is that in the case of the concrete timescale instance, the epoch argument is passed via argument 1 (RDI register in the native Linux x64 calling convention) and in the other case it is argument 3 (RDX register in Linux x64 calling convention).

So I have the following questions:

  1. Is there a reason to prefer a particular implementation over the other? For example, without considering the benchmark, the only advantage I see of the implementation where the timescale instance is stored in the Epoch struct is that it could also support some user defined timescale which would need to store extra data.

  2. why is there a difference in the LLVM calling sequence? In both cases, getoffset does not need the argument 1 and 2 (they are only used for dispatch) but they are optimized out only in one case.

Thank you

2 Likes

I believe I saw a discussion about this regarding some other package (perhaps Distributions?), and I understood that the possibility of storing extra parameters is a major upside.

1 Like

For the record, the third option that was discussed was values-as-parameters, e.g.

struct Epoch{S}
    second::Int64
    fraction::Float64
    function Epoch{S}(second, fraction)
        new{S::TimeScale}(second, fraction)
    end
end

I’d be interested to better understand the trade-offs between these solutions as well.

1 Like

I would suggest thinking about this from an API design perspective. If it is necessary for the user to know that an epoch uses some kind of timescale, make an accessor function for it (using traits if they play a role in further dispatch). Use type hierarchies as an implementation detail.

Also, decouple this decision from how the compiler treats code. This should be investigated, but it is not an incidental thing that could change in the medium run.

FWIW, I would go with values-as-parameters, as suggested by @helgee; but, again, just as an initial implementation, not something exposed by the API.

Sorry, I was being unclear. I was not actually suggesting it. In the PR linked above we moved away from values-as-parameters and toward the concrete-field approach. After using values-as-parameters for about a year in AstroTime.jl and AstroBase.jl, I had the feeling that I was working “against the grain of the language”. And it seems to me that (pre-)compile times which are becoming a problem for the aforementioned packages have improved but I actually do not have data.

In the end, this is very subjective and @Bernard_GODARD and I were hoping for some objective information about the consequences of each approach.

In terms of API design, it does not matter at all for AstroTime.jl IMHO. An ancient unreleased version used abstract types whereas the previous release (v0.3) used values-as-parameters and the most recent release (v0.4) uses concrete fields but the user-facing API is virtually unchanged.

I’d suggest CEpoch. I discussed this in ConstructionBase.jl documentation that types like this compose well without defining any special methods and ConstructionBase.jl API works even without depending on it. It means that types like this work with packages using ConstructionBase.jl API.

2 Likes

Thank you all for your feedback @DNF, @Tamas_Papp, @tkf. Looking at Distributions.jl as recommended by @DNF I can see that instances of singleton types are being used as in option 2. And according to @tkf documentation in ConstructionBase.jl , it is also recommended that those singleton instances are stored as record in the parameterized struct as in CEpoch. This corresponds to the current AstroTime implementation by @helgee.

2 Likes