Let’s say I plan to do something a billion times, like simulate draws from a probability distribution.
I want to ensure that my code is correct by writing a bunch of checks into the type constructor. E.g.
julia> struct VectorPD
events::Vector{T} where T
probabilities::Vector{Float64}
function VectorPD(es,ps)
if abs(sum(ps) - 1.0) >= 0.00001
error("Probabilities must sum to 1")
elseif any(ps .< 0.0)
error("Probabilities must be nonnegative")
else
new(es,ps)
end
end
end
julia> VectorPD([1,2,3],[0.0,0.5,0.5])
VectorPD([1, 2, 3], [0.0, 0.5, 0.5])
julia> VectorPD([1,2,3],[0.0,0.5,0.6])
ERROR: Probabilities must sum to 1
Stacktrace:
[1] VectorPD(::Array{Int64,1}, ::Array{Float64,1}) at ./none:6
[2] top-level scope at none:0
julia> VectorPD([1,2,3],[-0.1,0.5,0.6])
ERROR: Probabilities must be nonnegative
Stacktrace:
[1] VectorPD(::Array{Int64,1}, ::Array{Float64,1}) at ./none:8
[2] top-level scope at none:0
Great.
Now I want another similar type that doesn’t slow me down by performing the (perhaps costly, and again executed a billion times) safety check.
So I can do
julia> struct UnsafeVectorPD
events::Vector{T} where T
probabilities::Vector{Float64}
end
julia> UnsafeVectorPD([1,2,3],[0.0,0.5,0.5])
UnsafeVectorPD([1, 2, 3], [0.0, 0.5, 0.5])
julia> UnsafeVectorPD([1,2,3],[0.0,0.5,0.6])
UnsafeVectorPD([1, 2, 3], [0.0, 0.5, 0.6])
julia> UnsafeVectorPD([1,2,3],[-0.1,0.5,0.6])
UnsafeVectorPD([1, 2, 3], [-0.1, 0.5, 0.6])
Now say I want to run a billion tests of my code. Say my code refers to the concept of a probability distribution a lot, in all sorts of places.
What I’d like is to be able to instruct the code at a high level to, everywhere in all the different functions that use a probability distribution, use either the safe or unsafe type, depending on what I’m trying to do—i.e., depending on whether I’m testing that the code probably isn’t completely wrong by using the safe type on a small sample, or computing the actual results on a large sample but with the unsafe type:
run_stuff(1:1_000, VectorPD)
run_stuff(1:1_000_000_000, UnsafeVectorPD)
Of course, since the 1B case is going to be run a lot of times, I want the methods to be fast.
And because these things are used all throughout the code, I don’t want to redefine two versions of every method that depends on a probability distribution. E.g. I could do
function run_stuff_safe(...)
...
dependency_safe(...)
...
end
function dependency_safe(...)
...
dependency_of_dependency_safe(...)
...
end
... #etc etc etc
function final_dependency_safe(...)
return VectorPD( ... )
end
and an analogous chain of unsafe versions. But then I’m maintaining two parallel but essentially identical chunks of code.
I could pass the type all the way through from the top level to the bottom level, as an argument, but that seems almost as tedious. All the intermediate functions don’t need to know about which type to use; only the “bottom” one does.
And I (think?) I don’t want to have to use a macro throughout all these dependencies. Though maybe this is the solution–though I can’t think of how.
The other thing I’ve thought of is defining some high-level global reference to the type, and switching it.
function final_depenency(...)
global TypeToUse
return TypeToUse(...)
end
TypeToUse = VectorPD
run_stuff(1:1_000)
TypeToUse = UnsafeVectorPD
run_stuff(1:1_000_000_000)
But there again that seems like a poor idea for the obvious reasons.
Any suggestions? I think I’m probably missing something obvious. (NB, in the real use case, it would not be one, but a handful of types that would come in “safe” and “unsafe” flavors and need to be seamlessly swapped in where appropriate.)