Overriding Rand.rand for vectorization

Hello. I have followed the advice from Random Numbers · The Julia Language to create a custom sampler for my struct. This is good first step, but I would like to modify the sampler a bit. Currently the sampler takes tree-like objects whose leaves denote random variables and produces dictionaries mapping these random variables to values. For example

rand(t::SumSPE) -> Dict{Symbol, Float64}(:x=>1.0, :y=>2.0)

However, I often use rand to draw n dictionaries. Random.jl then produces a Vector{Dict{Symbol, Float64}} which is expected, but not exactly what I want. Since all the keys are fixed, I would love to have something of the form Dict{Symbol, Vector{Float64}} to create a vectorized version of the dictionary. Is there an idiomatic way of doing this?

Not sure I follow exactly, but rand doesn’t produce objects other than vectors in regular use of the API. Could you share an example of ehat you’re doing now, so it’s easier to follow?

If I understand your intent correctly, then such an output would not be inter-operable with the use of Base.rand in other contexts, at which point there’s very little reason to use Base.rand for this. You might consider just making a new function with your desired functionality and using that instead.

Not sure I follow exactly, but rand doesn’t produce objects other than vectors in regular use of the API. Could you share an example of ehat you’re doing now, so it’s easier to follow?

Here is an example. A ContinuousLeaf performs some minor computation, but at the very end calls a distribution from Distributions.jl (e.g. `Normal).

Base.eltype(::Type{<:ContinuousLeaf}) = Dict{Symbol,Float64}

function Random.rand(rng::AbstractRNG, d::Random.SamplerTrivial{T}) where {T<:ContinuousLeaf}
        leaf = d[]
        Dict(symbol(leaf) => rand(rng, leaf.dist))
end

leaf = ContinuousLeaf(:x, Normal(0,1), ...)
rand(leaf)
# Dict{Symbol, Float64}(:x=>0.0)
rand(leaf, 2)
# Vector{Dict{Symbol, Float64}}[Dict{Symbol, Float64}(:x=>0.0), Dict{Symbol, Float64}(:x=>1.0)]

The second call to rand produces a vector of dictionaries. Ideally, the call rand(leaf, n) would instead return a struct-of-arrays representation. Ex:

Dict{Symbol, Float64}(:x=>[0.0, 1.0])

There is a subsection on scalar vs array generation: Random Numbers · The Julia Language. Do you think it could be useful?

If I understand your intent correctly, then such an output would not be inter-operable with the use of Base.rand in other contexts, at which point there’s very little reason to use Base.rand for this. You might consider just making a new function with your desired functionality and using that instead.

Yea that might be best. Although how bad would interoperability break if I did override rand for these specific structs and only make such calls inside my module? It doesn’t seem like a huge problem right?

Breaking expectations of widely used library functions is rarely, if ever, a good idea.

1 Like

This overload is largely harmless but also largely useless.

You’re defining this for your own sampler type, so this is not an instance of type piracy. I.e., nothing that anybody has written without knowledge or use of your code will break. That’s why it’s harmless.

However, this has “violated” the output interface that every “normal” use of Base.rand has assumed, so you should not expect to be able to pass this sampler to other code that has a “normal” use and have it work properly (if at all). For example, you can’t pass this particular sampler to, e.g., some randomized linear algebra routine that attempts to call Base.rand(rng, sampler, M, N) (forgive me if the syntax there is slightly wrong – I haven’t experimented with custom samplers) to produce a random Matrix that it then uses for some calculation.

The main reason to add methods to existing functions is to orthogonalize algorithms from data types. I.e., the same calculation can be used to compute the exp of a Float32 or a square Matrix{T} for a wide range of suitable T (although in practice there are different optimizations and tradeoffs made, which is why we define specializations for both): all that is required is that + and * have suitable and “equivalent” definitions for either type.

But since the array output of your proposed sampler is nonstandard, it’s not an “equivalent” use. Other code has made assumptions about the output of Base.rand (i.e., that it can be used to produce an Array of values but not a Dict of Arrays) will be violated and the following code will be unlikely to work.

That is why I say that there isn’t a benefit to specializing Base.rand for this – existing uses of Base.rand should not be expected to function for this. This is why I say that such an overload is largely “useless” and I suggest a different function altogether. With a different function, you’ll be less likely to confuse yourself (or other people) thinking that this would work with standard Base.rand uses.

Since you say you’re only planning to call this within your module, there isn’t a operational reason to use Base.rand over some new function. If it’s still useful to define the sampler this way because the other machinery in the RNG interface saves you needing to re-write a bunch of extra boilerplate, go ahead and save yourself the trouble. But if you’re needing to re-implement most everything anyway, then Base.rand isn’t doing you any favors and is risking semantic confusion.