I am performing some simulations where a statistical model is used first to simulate a new response then to fit the response then to extract quantities of interest. The current code is
function parametricbootstrap(N::Integer, m::LinearMixedModel{T},
rng::AbstractRNG=Random.GLOBAL_RNG, props=(:objective, :σ, :β, :θ),
β::Vector{T} = m.β, σ::T = m.σ, θ::Vector{T} = m.θ) where {T}
y₀ = copy(response(m)) # to restore original state of m
n, p, q, nre = size(m)
length(β) == p && length(θ) == (k = length(getθ(m))) || throw(DimensionMismatch(""))
baseval = getproperty.(Ref(m), props)
ptype = typeof(baseval)
val = [NamedTuple{props, ptype}(
getproperty.(Ref(refit!(simulate!(rng, m, β=β, σ=σ, θ=θ))), props)) for _ in Base.OneTo(N)]
refit!(m, y₀) # restore original state
val
end
is for _ in Base.OneTo(N) a good way of phrasing “do this N times” or are there better alternatives?
Note that the ntulple solution will be type-unstable (and therefore potentially slower) if the value of N is not known to the compiler. So if N is a literal or a constant, it’s probably ok, but if N is a variable then the OneTo(N) approach is better.
I think it’s fine, but for long expressions like this, I find that having 1:N at the very end reduces readability – it’s off the screen in your code above for example.
There are several other ways of doing this, one idiom that I often use is:
val = 1:N .|>_-> NamedTuple{props, ptype}(...)
However, for readers not familiar with the broadcasted pipe, this probably looks confusing.
Don’t bother with OneTo in this case; using 1:N is surely going to give you exactly the same performance.
There is also the foreach function which you can use if you want to discard your results.
Since you’re already broadcasting in this case, you may be able to structure your broadcast such that it does the repetition for you — possibly using Iterators.repeated or a FillArray to define the resulting shape. I haven’t totally deciphered your code to see how/if that’s possible in this case, though.
The usage of the comprehension syntax looks fine to me. But if you want to try something else:
val = map(1:N) do _
NamedTuple{props, ptype}(...)
end
In many cases, this could be a great solution. But I think this is not good for the case here because the computation involves side-effect (presumably due to rng). I suppose it’s better to use broadcasting only for “pure” functions?
I love writing things like A .= rand.(). While we don’t specify the order of execution in a broadcast, we do say that if the rand.() call is dotted then it appears in the “kernel” and will be executed every loop.
Base.OneTo(N) statically encodes the fact that it’s starting at 1within the type itself. This was needed to eek the very last bit of performance out of array iteration and axes computations and it now distinguishes one-based arrays from offset arrays.
We now have very powerful constant propagation so unless you’re actually dealing with array axes, you don’t need to use Base.OneTo in general code.
Thanks to all. I ended up using a thunk as suggested by @tkf and it works fine.
"""
parametricbootstrap(rng::AbstractRNG, nsamp::Integer, m::LinearMixedModel,
properties=(:objective, :σ, :β, :θ); β = m.β, σ = m.σ, θ = m.θ)
Perform `nsamp` parametric bootstrap replication fits of `m`, returning a
`Vector{NamedTuple}` (a.k.a. `Tables.RowTable`) of `properties` of the refit model.
# Named Arguments
`β`, `σ`, and `θ` are the values of the parameters of `m` used to simulate the responses.
"""
function parametricbootstrap(rng::AbstractRNG, nsamp::Integer, m::LinearMixedModel,
properties=(:objective, :σ, :β, :θ); β = m.β, σ = m.σ, θ = m.θ)
y₀ = copy(response(m)) # to restore original state of m
try
map(1:nsamp) do _
refit!(simulate!(rng, m, β=β, σ=σ, θ=θ)) # simulate a response and refit model
NamedTuple{properties, typeof(getproperty.(Ref(m), properties))}(
getproperty.(Ref(m), properties)) # extract properties as a named tuple
end
finally
refit!(m, y₀) # restore original state of m
end
end
There are many ways to write this but I personally use for _ = 1:n; ...; end since it’s considerably easier on the compiler than idioms that use closures.
Okay, it is cleaner to write a plain old for loop.
function pbootstrap(rng::AbstractRNG, nsamp::Integer, m::LinearMixedModel,
properties=(:objective, :σ, :β, :θ); β = m.β, σ = m.σ, θ = m.θ)
y₀ = copy(response(m)) # to restore original state of m
vtype = NamedTuple{properties, typeof(getproperty.(Ref(m), properties))}
value = Vector{vtype}(undef, nsamp)
@inbounds for i in 1:nsamp
refit!(simulate!(rng, m, β=β, σ=σ, θ=θ))
value[i] = vtype(getproperty.(Ref(m), properties))
end
refit!(m, y₀) # restore original state of m
value
end
I want you to know that my barista asked me what I was up to today and I proudly told her I would be writing a thunk. Now I have to go back and tell her that I ended up with a sorry old “for loop”. My reputation is going to take a hit.
ERROR: MethodError: Cannot `convert` an object of type Tuple{Float64,Float64,Array{Float64,1},Array{Float64,1}} to an object of type NamedTuple{(:objective, :σ, :β, :θ),Tuple{Float64,Float64,Array{Float64,1},Array{Float64,1}}}
Yeah, adding names to a tuple isn’t a conversion… but you can add names like that with the constructor. There are cases where conversion is the same as construction, but this isn’t one.
There is an R function replicate that evaluates an expression n times collecting the results, usually in an array. It is specifically designed for simulations.
That is what I had in mind when I started on this but I think that the looping may be more natural. In R, of course, one tries to avoid explicit loops but in Julia there is no need to “fear the loop”.
vtype = NamedTuple{properties, typeof(getproperty.(Ref(m), properties))}
value = Vector{vtype}(undef, nsamp)
for i in 1:nsamp
refit!(simulate!(rng, m, β=β, σ=σ, θ=θ))
value[i] = vtype(getproperty.(Ref(m), properties))
end
with something like
value = [(refit!(simulate!(rng, m, β=β, σ=σ, θ=θ));
NamedTuple{properties}(getproperty.(Ref(m), properties)))
for _ in 1:nsamp]
I would also go with a higher order function here, and not for loop. There are a lot of situations where I think a for loop enhances readability over a functional approach, but this is not one of them. To be specific, I don’t like that you have to specify the type and size of the output vector twice in this code. IMO it leads to a slightly higher maintenance cost and risk of bugs.