Nope, it returns the minimum time (not for a single call in general, but for a loop involving enough calls to get good timing resolution). Generally, this is the right thing to do in order to exclude errors from system jitter: because timing noise is essentially always positive, taking the minimum is the right thing to do. A longer version of this explanation is here: [1608.04295] Robust benchmarking in noisy environments
However, in timing functions that involve pseudorandom number generators, where the function execution itself may take different amounts of time in different calls, measuring the average is probably more appropriate.
This would be great. I have used Julia on the back end for some R functions, for example high dimensional co-variance matrix estimation. However, I did not use the Wishart in those functions and thus Julia provided considerably better performance. Improving the speed would be wonderful, in particular because the Wishart distribution plays in important role in Bayesian statistics where we might need to sample many thousands of times, and do additional stuff as well where the time starts to add up.
The main issue I have with this view is that much jitter is from garbage collection time is an unavoidable part of code execution and accounts for a considerable amount of variability. I think that a sensible way to model this would be to take the minimum of a computation minus garbage collection time as a model of core compute without garbage collection but then measure the mean of garbage collection time to that as a model of âtypicalâ garbage collection overhead.
The randwishart accepts multiples of the identity matrix, passing the multiple to syrk!.
julia> (2.5I).λ
2.5
help?> BLAS.syrk!
syrk!(uplo, trans, alpha, A, beta, C)
Rank-k update of the symmetric matrix C as alpha*A*transpose(A) + beta*C or alpha*transpose(A)*A + beta*C according to trans.
Only the uplo triangle of C is used. Returns C.
My randinvwishart function however only accepts UniformScaling{Bool}, meaning it only accepts the identity and zero matrix.
If youâd like to extend these, here is the wikipedia page on Bartlettâs decomposition.
I was simply generating the random matrix A. Youâll need L too, and ideally, only calculate L once! If youâre writing a Gibbs sampler, and someone passes in L as a prior, calculate it when initializing the sampler, and store it. No need to factor a matrix 10000 times!
Multiple dispatch can make the generation efficient. Iâd keep the special uniform scaling definitions, because I think thatâd be a common use case, and I * A still allocates:
julia> A = randn(1000,1000);
julia> @benchmark I * $A
BenchmarkTools.Trial:
memory estimate: 7.63 MiB
allocs estimate: 2
--------------
minimum time: 801.011 ÎŒs (0.00% GC)
median time: 2.537 ms (0.00% GC)
mean time: 2.133 ms (21.27% GC)
maximum time: 55.167 ms (95.53% GC)
--------------
samples: 2333
evals/sample: 1
so you would want to write the function to explicitly skip that multiplication, and stick the scale parameter somewhere convenient.
But if you just right one more method with L * A, multiple dispatch will make it efficient when L happens to be diagonal (probably another common use case).
I updated Julia, and implemented it in parallel. It is fast now, and a bit faster than R. I do not think implementing it in parallel in R would have much gain, because of the initial overhead.
Julia
using Distributions; using LinearAlgebra;
A = zeros(20,20,50000)
function parallel_test(nu)
inv_wish = rand(Wishart(nu, Matrix{Float64}(I,nu,nu)))
end
@time Threads.@threads for i = 1:50000
A[:,:,i] = parallel_test(20);
end;
1.043198 seconds (3.16 M allocations: 968.912 MiB, 8.96% gc time)
Now R:
system.time(replicate(50000, rWishart(1, 20, diag(20)) ))
user system elapsed
1.13 0.16 1.2
I also checked with out storing the matrices in Julia, and here it was often less than 1 second.
InverseWishart is in fact really slow (Julia 0.7). I quickly made this comparison, and surprised with the huge difference in time. Initial process (rand(),MâM) in the functions can be ignored. The doubled speed is almost entirely from sampling from Wishart. MâM is just a symmetric PD matrix.
import Random
using BenchmarkTools
using Distributions
function sampleIW(nu)
M = rand(nu,nu)
M = M'M
IW = rand(InverseWishart(nu,convert(Array,Symmetric(M))))
return(IW)
end
function sampleWinv(nu)
M = rand(nu,nu)
M = M'M
wishInv = inv(rand(Wishart(nu,convert(Array,Symmetric(inv(M))))))
return(wishInv)
end
Random.seed!(1234)
@benchmark sampleIW(3)
memory estimate: 69.41 KiB
allocs estimate: 559
--------------
minimum time: 386.254 ÎŒs (0.00% GC)
median time: 391.637 ÎŒs (0.00% GC)
mean time: 409.886 ÎŒs (2.36% GC)
maximum time: 39.793 ms (98.80% GC)
--------------
samples: 10000
evals/sample: 1
Random.seed!(1234)
@benchmark sampleWinv(3)
memory estimate: 39.25 KiB
allocs estimate: 298
--------------
minimum time: 201.376 ÎŒs (0.00% GC)
median time: 204.766 ÎŒs (0.00% GC)
mean time: 216.488 ÎŒs (3.10% GC)
maximum time: 38.441 ms (99.27% GC)
--------------
samples: 10000
evals/sample: 1
I am not. I just tested on 0.7 assuming that the initial question is related to versions<1.0 as there was âeye(20)â. But now I see that DW has updated, so my comparison is not relevant to his problem anymore.