Currently sampling with replacement in Julia seems to be significantly slower than in R (tested on Julia 0.6.0, 64bit). See e.g.:
> x <- runif(10^7)
> system.time(sample(x, replace=T))
user system elapsed
0.38 0.02 0.39
vs.
julia> x = rand(10^7);
julia> @benchmark rand($x, 10^7)
BenchmarkTools.Trial:
memory estimate: 76.29 MiB
allocs estimate: 3
--------------
minimum time: 648.989 ms (0.03% GC)
median time: 663.557 ms (1.52% GC)
mean time: 675.775 ms (3.53% GC)
maximum time: 724.599 ms (9.70% GC)
--------------
samples: 8
evals/sample: 1
Sampling with replacement is a common operation so I believe it would be nice to speed it up.
I do not have a full solution, but a partial one could be to make function rand!(rng::AbstractRNG, A::AbstractArray, r::AbstractArray)
check if length(r)
is small (so that it fits into 32 bits) and then create RangeGenerator
with a narrower domain. My preliminary tests show that this gives ~15% reduction of execution time.
Another approach could be:
julia> @benchmark $x[rand(1:10^7,10^7)]
BenchmarkTools.Trial:
memory estimate: 152.59 MiB
allocs estimate: 5
--------------
minimum time: 404.157 ms (2.95% GC)
median time: 409.564 ms (3.99% GC)
mean time: 420.688 ms (6.34% GC)
maximum time: 479.673 ms (18.14% GC)
--------------
samples: 12
evals/sample: 1
and
julia> @benchmark $x[rand(Int32(1):Int32(10^7),10^7)]
BenchmarkTools.Trial:
memory estimate: 114.44 MiB
allocs estimate: 5
--------------
minimum time: 329.700 ms (3.44% GC)
median time: 338.070 ms (3.81% GC)
mean time: 345.072 ms (6.22% GC)
maximum time: 404.187 ms (20.81% GC)
--------------
samples: 15
evals/sample: 1
Unfortunately they are more memory hungry (the last one is faster than R), but the question is why rand(x, 10^7)
is so much slower?