Julia vs. R Speed for sampling the Wishart distribution

Nope, it returns the minimum time (not for a single call in general, but for a loop involving enough calls to get good timing resolution). Generally, this is the right thing to do in order to exclude errors from system jitter: because timing noise is essentially always positive, taking the minimum is the right thing to do. A longer version of this explanation is here: [1608.04295] Robust benchmarking in noisy environments

However, in timing functions that involve pseudorandom number generators, where the function execution itself may take different amounts of time in different calls, measuring the average is probably more appropriate.

4 Likes

This would be great. I have used Julia on the back end for some R functions, for example high dimensional co-variance matrix estimation. However, I did not use the Wishart in those functions and thus Julia provided considerably better performance. Improving the speed would be wonderful, in particular because the Wishart distribution plays in important role in Bayesian statistics where we might need to sample many thousands of times, and do additional stuff as well where the time starts to add up.

Cool ! Is this solution for an identity matrix only as the scale matrix ?

The main issue I have with this view is that much jitter is from garbage collection time is an unavoidable part of code execution and accounts for a considerable amount of variability. I think that a sensible way to model this would be to take the minimum of a computation minus garbage collection time as a model of core compute without garbage collection but then measure the mean of garbage collection time to that as a model of “typical” garbage collection overhead.

3 Likes

The randwishart accepts multiples of the identity matrix, passing the multiple to syrk!.

julia> (2.5I).λ
2.5

help?> BLAS.syrk!
  syrk!(uplo, trans, alpha, A, beta, C)


  Rank-k update of the symmetric matrix C as alpha*A*transpose(A) + beta*C or alpha*transpose(A)*A + beta*C according to trans.
  Only the uplo triangle of C is used. Returns C.

My randinvwishart function however only accepts UniformScaling{Bool}, meaning it only accepts the identity and zero matrix.
If you’d like to extend these, here is the wikipedia page on Bartlett’s decomposition.

I was simply generating the random matrix A. You’ll need L too, and ideally, only calculate L once! If you’re writing a Gibbs sampler, and someone passes in L as a prior, calculate it when initializing the sampler, and store it. No need to factor a matrix 10000 times!

Multiple dispatch can make the generation efficient. I’d keep the special uniform scaling definitions, because I think that’d be a common use case, and I * A still allocates:

julia> A = randn(1000,1000);

julia> @benchmark I * $A
BenchmarkTools.Trial: 
  memory estimate:  7.63 MiB
  allocs estimate:  2
  --------------
  minimum time:     801.011 ÎŒs (0.00% GC)
  median time:      2.537 ms (0.00% GC)
  mean time:        2.133 ms (21.27% GC)
  maximum time:     55.167 ms (95.53% GC)
  --------------
  samples:          2333
  evals/sample:     1

so you would want to write the function to explicitly skip that multiplication, and stick the scale parameter somewhere convenient.
But if you just right one more method with L * A, multiple dispatch will make it efficient when L happens to be diagonal (probably another common use case).

I updated Julia, and implemented it in parallel. It is fast now, and a bit faster than R. I do not think implementing it in parallel in R would have much gain, because of the initial overhead.

Julia

using Distributions; using LinearAlgebra;

A = zeros(20,20,50000)

function parallel_test(nu)
           inv_wish = rand(Wishart(nu, Matrix{Float64}(I,nu,nu)))
           end
 
@time Threads.@threads  for i = 1:50000
    A[:,:,i] = parallel_test(20);
end;

1.043198 seconds (3.16 M allocations: 968.912 MiB, 8.96% gc time)

Now R:

system.time(replicate(50000, rWishart(1, 20, diag(20)) ))

 user  system elapsed 
   1.13    0.16    1.2

I also checked with out storing the matrices in Julia, and here it was often less than 1 second.

2 Likes

InverseWishart is in fact really slow (Julia 0.7). I quickly made this comparison, and surprised with the huge difference in time. Initial process (rand(),M’M) in the functions can be ignored. The doubled speed is almost entirely from sampling from Wishart. M’M is just a symmetric PD matrix.

import Random
using BenchmarkTools
using Distributions

function sampleIW(nu)
       M = rand(nu,nu)
       M = M'M
       IW = rand(InverseWishart(nu,convert(Array,Symmetric(M))))
       return(IW)
end

function sampleWinv(nu)
       M = rand(nu,nu)
       M = M'M
       wishInv = inv(rand(Wishart(nu,convert(Array,Symmetric(inv(M))))))
       return(wishInv)
end

Random.seed!(1234)
@benchmark sampleIW(3)

  memory estimate:  69.41 KiB
  allocs estimate:  559
  --------------
  minimum time:     386.254 ÎŒs (0.00% GC)
  median time:      391.637 ÎŒs (0.00% GC)
  mean time:        409.886 ÎŒs (2.36% GC)
  maximum time:     39.793 ms (98.80% GC)
  --------------
  samples:          10000
  evals/sample:     1

Random.seed!(1234)
@benchmark sampleWinv(3)

  memory estimate:  39.25 KiB
  allocs estimate:  298
  --------------
  minimum time:     201.376 ÎŒs (0.00% GC)
  median time:      204.766 ÎŒs (0.00% GC)
  mean time:        216.488 ÎŒs (3.10% GC)
  maximum time:     38.441 ms (99.27% GC)
  --------------
  samples:          10000
  evals/sample:     1


Is there a reason you’re using a 2 year old version of the language?

I am not. I just tested on 0.7 assuming that the initial question is related to versions<1.0 as there was “eye(20)”. But now I see that DW has updated, so my comparison is not relevant to his problem anymore.

1 Like