Improving the code speed by employing parallelism for asynchronous task

Nova · September 21, 2020, 10:01pm

I was trying to employ parallelism to improve the speed of the code using multi-threading. However, I noticed that I get wrong answer using multi-threading. The reason for that is my code update a value in a for loop consequently. For clarification, I posted the code below:

val=1
@inbounds for iter2 = 1:length(E)

        nd = unique(ar[E[iter2]])

        fd_flag = flag[nd]

        node = nd[.!fd_flag]

        if length(node) > 0

            sm_mat = view(SV, node, :)

            R = kmeans(sm_mat', 2);

            idx_clus = R.assignments

            idx_new[node] = idx_clus .+ val;

            flag[node] .= 1

            val = val + maximum(idx_clus);

        end # end of if
    end # end iof iter2

I am updating “val” in each iteration with respect to its previous value and by applying multi-threading, I don’t get the correct result. Is there any way to apply any kind of parallelism to improve the speed. This is the bottle neck of my code and because it goes for a large number of iterations, it make the code slow.
Minor question: Is there any other way to make K-mean faster?

lmiq · September 21, 2020, 10:48pm

First, it is always better if you post a code that can be actually run.

What you can do is to define an array that will contain the result of the sum performed in each thread. Something as this:

julia> nthreads = Threads.nthreads()
4

julia> val_thread = zeros(nthreads)
4-element Array{Float64,1}:
 0.0
 0.0
 0.0
 0.0

julia> x = rand(1000);

julia> Threads.@threads for i in 1:length(x)
          id = Threads.threadid()
          val_thread[id] = val_thread[id] + x[i] # whatever
       end

julia> sum(val_thread)
500.0066319867228

julia> sum(x)
500.0066319867229

Nova · September 21, 2020, 11:28pm

There is a slight difference from what I am doing with what your code does. I use those middle values to update another variable:

global idx_new[node] = idx_clus .+ val;

Here is a code sample that can be executed:

ar = [[1,2,3,4,5], [2,3,4,5,6,7,8], [4,7,8,9], [9,10], [2,3,4,5]]

SV = rand(10,5)

flag = falses(10)

idx_new = zeros(Int, 10)

val=1

@inbounds Threads.@threads for iter2 = 1:5

      global flag

      nd = unique(ar[iter2])

      fd_flag = flag[nd]

      node = nd[.!fd_flag]

      if length(node) > 0

          sm_mat = view(SV, node, :)

          R = kmeans(sm_mat', ceil(Int, length(node)/2))

          idx_clus = R.assignments

          global idx_new[node] = idx_clus .+ val;

          flag[node] .= 1

          global val = val + maximum(idx_clus);

      end # end of if

  end # end iof iter2

If you remove Threads.@threads, then you will get a different value for the val at the end.

lmiq · September 21, 2020, 11:51pm

I still cannot run your example, where the kmeans function is defined?

Also, I do not see where the idx_new vector is being used in the loop. It seems that it is just another variable that could be updated at the end.

Minor comments: you must avoid using global variables. It is best to wrap everything inside a function or at least add let ... end around everything and remove those global… statements. This may accelerate things, although I imagine that the costly part is inside the kmeans functions. Additionally, I would recommend removing the @inbounds at this point of the development stage. That kind of flag, if it makes any significant difference (not always it does), should be added when the code is running flawlessly, otherwise you will only miss the opportunity to find bugs.

Nova · September 22, 2020, 2:05am

I believe that you should be able to run the code by adding this:

using Clustering

you can see that idx_new will be updated during this code which I need it for using some other functions. Yes, I don’t use global variables at all. I just put them here to make this small example executable. I keep that in mind for using @inbounds as well.

danielw2904 · September 22, 2020, 7:06am

This sounds like parallelism is not possible since the next computation depends on the previous one. In general, for adding in a thread safe way take a look at

https://docs.julialang.org/en/v1/manual/multi-threading/#Atomic-Operations

Topic		Replies	Views
Having issues speeding up code with multithreading Performance parallel , multithreading	19	599	July 16, 2023
How to speed up this simple code? Multithreading, simd, inbounds Performance	39	7021	January 29, 2019
Speed up for-loop with multithreading Performance question , multithreading	10	1835	April 22, 2022
Threads maxing out all cores, but no performance increase General Usage performance , threads	16	1829	April 6, 2021
How to speed up tasks? Performance	8	803	September 23, 2020

Improving the code speed by employing parallelism for asynchronous task

Related topics