First time trying parallel processing; I’d like to run replicates for the kmeans algorithm in parallel to find a global minimum.
Just calling kmeans
on some data works fine:
using Clustering
n_data_points = 100000
data = rand(4,n_data_points)
n_clusters = 20
R = kmeans(data, n_clusters; maxiter=1000)
I tried using a parallel for loop, but get an error saying BoundsError: attempt to access 0-element Vector{Float64} at index [1]
.
using Distributed
using SharedArrays
n_reps = 32
cost_R = SharedArray{Float64}(n_reps)
asgn_R = SharedArray{Int64}(n_data_points, n_reps)
addprocs()
@everywhere using Pkg
@everywhere Pkg.activate("..")
@everywhere using Clustering
@sync @distributed for repᵢ ∈ 1:n_reps
R = kmeans(data, n_clusters; maxiter=1000)
cost_R[repᵢ] = sum(R.costs)
asgn_R[:,repᵢ] = assignments(R)
end
If I exclude addprocs()
and run the loop, it executes alright.