I need to pick a random position (not a random element) in an Array. I’ll assume a two dimensional array for this question. The first that came to mind was
function ridx(A)
return (rand(1:size(A,1)),rand(1:size(A,2)))
end
But then, learning about CartesianIndex, I found one can also do
function ridx2(A)
return rand(CartesianIndices(A))
end
which is simpler to understand and works for arrays of any dimension. I also thought it would be faster than ridx, with only one call to rand instead of 2. However, the opposite seems to be the case (by a tiny but consistent across several test difference).
So my question is, is ridx2 a correct, reasonable and efficient use of rand and CartesianIndices, or am I missing something here?
I agree 2ns is not much (even if I will be using this in a Monte Carlo program where it will be called 10 to 100 million times in a typical run). Still, it doesn’t make sense to me that calling rand twice be faster than calling it once…
Thanks @Eben60 but I need the indexes, not the element. This is because I am using the array to represent a lattice, and I need to select a site and later identify its neighbors. So I will need to convert to a tuple or CartesianIndex at some point.
Since linear indices work for arrays of any dimension in Julia, one can simply write rand(eachindex(A)). This will give you a single number as index. It is 2X faster than both CartesianIndices and calling rand twice on my PC and works for arrays of any dimensions.
I understand the 2D location is required here for accessing the neighbors later, then CartesianIndices would be the easiest. But linear indices will be faster in general and accessing the neighbors can still be easy by adding/subtracting n for right and left, or adding/subtracting 1 for below and above.
This advice isn’t always applicable, especially since @btime only reports the minimum time. If you are performing this in an inner loop, excessive allocations lead to more GC calls, slowing the program down. @benchmark is much more enlightening (looking at mean or median for example) when comparing allocating vs non-allocating methods, especially if these two methods are being called many times and not just once.
Thank you @jling@Seif_Shebl@Dan@jmair for the input. Bottom line seems to be: CartesianIndices are easier and clearer, but LinearIndices are faster. I’ll consider whether to reimplement neighbour finding using linear indices (the trickier part are of course the boundaries).
If using cartesian indices are more appropriate to your problem, I would suggest to use them… and it’s more ergonomic, e.g. you can easily get a vector (rand(CartesianIndices(a), 100), which as noted in a previous post should be faster than individual calls to rand), or fill a preallocated vector (rand!(output, CartesianIndices(a))).
How much slower is the resolution of your problem with CartesianIndices compared to using linear indices?
If really necessary, you can cook a rand method for CartesianIndices which should close the performance gap:
import Random: Sampler
using Random: SamplerSimple
@generated function Sampler(::Type{RNG}, xx::CartesianIndices{N}, rep::Random.Repetition) where {N, RNG<:AbstractRNG}
samplers = [:(Sampler(RNG, axes(xx, $ii))) for ii = 1:N]
:(SamplerSimple(xx, tuple($(samplers...))))
end
@generated function Base.rand(rng::AbstractRNG, sp::SamplerSimple{<:CartesianIndices{N}}) where {N}
rands = [:(rand(rng, sp.data[$ii])) for ii = 1:N]
:(tuple($(rands...)))
end
But it’s type piracy, so depending on the context, applying this to another type mimicking CartesianIndices could be more appropriate.
Thank you @rfourquet for you suggestion, I’l look into it. As for how slower my application is, in my first try it seemed to run about 30% slower. But it since to use CartesianIndices I had to make several changes, it is hard to know where to put the blame for the performance loss; I may have made a poor coding choice at some point. I tried several bits of code, and picking the random position seemed to be one of the culprits, so that’s why I asked the question.
I am now trying to introduce CartesianIndices making a minimum of changes at a time and checking for performance at each step, but this will take a while since I have other projects to deal with, and at least I have a functioning version now.
My current code (w/o CartesianIndices) is in GitHub if anyone wants to have a look.