# Vectorize and parallelize wsample function

Hey everyone,

I have a function that in theory should be fully parallelizable as it is independent of the previous input. In the example below i just fill a vector with categorically distributed random variables:

``````using Distributions

#Sampling Categorical data
output = zeros(Int64, 3)
prob = [ .4 .6 ; .5 .5 ; .9 .1]

for i in 1:size(prob,1)
output[i] = wsample(prob[i,:]) #works - and seems to be faster than the method below
#output[i] = rand(Categorical(prob[1,:])) #works
end
``````

Now I would like to vectorize either of the 2 methods above, but the common syntax does not seem to work in this case here:

``````#vectorize via wsample
wsample.( prob )#NOT WORKING
map(x -> wsample(x), prob ) #NOT WORKING

#Vectorize via Distributions
map(x -> rand(Categorical(x)), prob ) #NOT WORKING
rand.( Categorical.( prob ) ) # NOT WORKING
``````

Question 1: Does anyone know a way to use said methods in a vectorized way? I did not see any thread regarding this, so I guess I must do something wrong as this case comes up in lots of algorithms. I know that this is not necessarily faster than a for-loop in a compiled language but I would at least like to benchmark it for my problems.

As a step further, I would also like to use the `pmap` function, but that obviously also does not work with the code I provided.

The first question to ask is: why do you want to “vectorize” the code in this way? Loops in Julia are fast, and if you have a working loop there is no need to change it into a map or broadcast call unless that helps you organize or structure your code more clearly.

Assuming that you do want to use map or broadcast, then you need to tell Julia to treat your `prob` not as a container of individual floats but as a collection of rows. For example, you could broadcast over `eachrow(prob)` rather than `prob`.

However, take a look at the answer by `@stevengj` in Some doubts about types which also applies in your situation. It is common practice in Matlab or numpy to express a collection of vectors as a matrix, but that’s neither ideal nor necessary in Julia. For example, if you were to transpose your `prob` data and treat it as a collection of `SVectors` from StaticArrays, then you could easily broadcast over it:

``````julia> prob = [SVector(.4, .5, .9), SVector(.6, .5, .1)]
2-element Array{SArray{Tuple{3},Float64,1,3},1}:
[0.4, 0.5, 0.9]
[0.6, 0.5, 0.1]

julia> sum.(prob)
2-element Array{Float64,1}:
1.8
1.2000000000000002
``````

There is no difference in memory layout between a matrix where each column is an element or a vector of SVectors representing those columns, but it is much easier to operate on each element if you actually represent your collection of vectors as a collection of vectors.

4 Likes

Just showing some examples of what @rdeits explained.

Make a vector of vectors

``````julia> prob = [[0.4, 0.6], [0.5, 0.5], [0.9, 0.1]]
3-element Array{Array{Float64,1},1}:
[0.4, 0.6]
[0.5, 0.5]
[0.9, 0.1]

julia> wsample.(prob)
3-element Array{Int64,1}:
2
1
1

# benchmark
julia> @btime wsample.(\$prob);
105.158 ns (4 allocations: 208 bytes)
``````

Using `SVector`:

``````julia> prob = [SVector(0.4, 0.6), SVector(0.5, 0.5), SVector(0.9, 0.1)]
3-element Array{SArray{Tuple{2},Float64,1,2},1}:
[0.4, 0.6]
[0.5, 0.5]
[0.9, 0.1]

julia> wsample.(prob)
3-element Array{Int64,1}:
2
1
1

# benchmark (it's slightly faster)
julia> @btime wsample.(\$prob)
77.710 ns (4 allocations: 208 bytes)
3-element Array{Int64,1}:
1
2
1
``````

You can in principle parallelize this with threads, but there is a serious caveat: As far as I know random number generation is currently not thread-safe, so the results are not reliable:

``````julia> s = zeros(3);

s[i] = wsample(prob[i])
end

julia> s
3-element Array{Float64,1}:
2.0
2.0
1.0

# benchmark
\$s[i] = wsample(\$prob[i])
end
472.513 ns (2 allocations: 78 bytes)
``````

Yikes! With 4 threads, this is much slower than serial. The overhead appears to dwarf the advantages of parallelization. I think the overhead would be even much worse for `pmap`.

1 Like

Oh, also, I should mention that if your primary concern is performance, please take a look at https://docs.julialang.org/en/v1/manual/performance-tips/index.html , particularly the part about putting your computation into a function rather than relying on global variables as your example code does.

2 Likes

I will have a look at the `StaticArrays` module as well, thank you for the suggestion!