Improving the code speed by employing parallelism for asynchronous task

I was trying to employ parallelism to improve the speed of the code using multi-threading. However, I noticed that I get wrong answer using multi-threading. The reason for that is my code update a value in a for loop consequently. For clarification, I posted the code below:

val=1
@inbounds for iter2 = 1:length(E)

        nd = unique(ar[E[iter2]])

        fd_flag = flag[nd]

        node = nd[.!fd_flag]

        if length(node) > 0

            sm_mat = view(SV, node, :)

            R = kmeans(sm_mat', 2);

            idx_clus = R.assignments

            idx_new[node] = idx_clus .+ val;

            flag[node] .= 1

            val = val + maximum(idx_clus);

        end # end of if
    end # end iof iter2

I am updating “val” in each iteration with respect to its previous value and by applying multi-threading, I don’t get the correct result. Is there any way to apply any kind of parallelism to improve the speed. This is the bottle neck of my code and because it goes for a large number of iterations, it make the code slow.
Minor question: Is there any other way to make K-mean faster?

First, it is always better if you post a code that can be actually run.

What you can do is to define an array that will contain the result of the sum performed in each thread. Something as this:

julia> nthreads = Threads.nthreads()
4

julia> val_thread = zeros(nthreads)
4-element Array{Float64,1}:
 0.0
 0.0
 0.0
 0.0

julia> x = rand(1000);

julia> Threads.@threads for i in 1:length(x)
          id = Threads.threadid()
          val_thread[id] = val_thread[id] + x[i] # whatever
       end

julia> sum(val_thread)
500.0066319867228

julia> sum(x)
500.0066319867229

There is a slight difference from what I am doing with what your code does. I use those middle values to update another variable:

global idx_new[node] = idx_clus .+ val;

Here is a code sample that can be executed:

ar = [[1,2,3,4,5], [2,3,4,5,6,7,8], [4,7,8,9], [9,10], [2,3,4,5]]

SV = rand(10,5)

flag = falses(10)

idx_new = zeros(Int, 10)

val=1

@inbounds Threads.@threads for iter2 = 1:5

      global flag

      nd = unique(ar[iter2])

      fd_flag = flag[nd]

      node = nd[.!fd_flag]

      if length(node) > 0

          sm_mat = view(SV, node, :)

          R = kmeans(sm_mat', ceil(Int, length(node)/2))

          idx_clus = R.assignments

          global idx_new[node] = idx_clus .+ val;

          flag[node] .= 1

          global val = val + maximum(idx_clus);

      end # end of if

  end # end iof iter2

If you remove Threads.@threads, then you will get a different value for the val at the end.

I still cannot run your example, where the kmeans function is defined?

Also, I do not see where the idx_new vector is being used in the loop. It seems that it is just another variable that could be updated at the end.

Minor comments: you must avoid using global variables. It is best to wrap everything inside a function or at least add let ... end around everything and remove those global… statements. This may accelerate things, although I imagine that the costly part is inside the kmeans functions. Additionally, I would recommend removing the @inbounds at this point of the development stage. That kind of flag, if it makes any significant difference (not always it does), should be added when the code is running flawlessly, otherwise you will only miss the opportunity to find bugs.

I believe that you should be able to run the code by adding this:

using Clustering 

you can see that idx_new will be updated during this code which I need it for using some other functions. Yes, I don’t use global variables at all. I just put them here to make this small example executable. I keep that in mind for using @inbounds as well.

This sounds like parallelism is not possible since the next computation depends on the previous one. In general, for adding in a thread safe way take a look at