What is faster than channel to synchronize between threads?

I have MWE as below, in which I am manipulating Do Some Works on parallel using threads that are synchronized between via channel. However, this channel seems to be not fast and I am wondering if there is another synchronization primitive directive that can lead to faster performance?

function KSD()
        time_loop = true
        channels = [Channel{Nothing}(0) for i in 1:(nthreads()-1)]
        thrs = [@tspawnat i begin
                while time_loop
                  take!(channels[threadid()-1])
                  println("Do Some Works From Thread $(threadid())")
                  take!(channels[threadid()-1])    
                end 
         end for i = 2:nthreads()];
        for t in 1:1000
          put!.(channels, nothing)
          println("Do Some Works From Thread $(threadid())")
          put!.(channels, nothing)
        end # for t in 1:3
        time_loop = false
       fetch.(thrs);
       return nothing
end # function KSD()
KSD()

What makes you think that this is (too) slow and further, that channels are the culprit?
Your KSD can deadlock, i.e., when a thread enters the while loop again before time_loop becomes false it will block on the channel and accordingly fetch never succeeds. In general, coordinating threads via shared variables is very difficult and error-prone … that’s why we have channels.
In any case, you can also try lock for thread coordination which is somewhat more low level than channels.

1 Like

Thanks for your feedback.

I read that in a blog which it is mentioned that channel could be very slow.

I don’t have a clue how to apply lock in this MWE to synchronize the the threads (only used them to secure modifying shared variables). Could you please help me here or at least give me the idea?

Had no particular application in mind, just recalled that locks can be used to implement some higher-level synchronization constructs. Imho channels are much better and easier to use. I see no reason to avoid them. In any case, if parallel performance is the main concern any synchronization needs to be reduced to a minimum no matter which construct is used.
I’m also not quite sure what exactly you are trying to do in your MWE, i.e., why do you need control over where a thread runs and start and stop its work explicitly? Channels are often nice for pub-sub type concurrency or some simple work stealing, i.e., a producer just publishes work items and several workers take a new one whenever ready. When all tasks need to catch-up with each other something like a barrier might be used (which in turn can be implemented using locks or channels).

2 Likes

There is a semaphore in Base. Don’t know if it is faster, but I’m also not aware that channels are particularly slow – which version of Julia was that blog talking about? Did it provide any evidence?
Anyways, as has been suggested several times in this forum when discussing parallel processing: First, seriously optimize your single threaded code. Then, when understanding its limitations/tradeoffs and memory requirements, make it multi-threaded. At all stages benchmark your changes and decisions, i.e., are channels really holding you back here? There are certainly many more people willing to help if you can post a working example of your best efforts on your actual problem (without guessing on where the bottlenecks might be).

2 Likes

I have posted mu thought about the the implementation with semaphore. But it is not working, could you please guide me?

The do notation you use with acquire is syntactic sugar, i.e.,

Base.acquire(sem) do
   println("Do Some Works From Thread $(threadid())")
end

is the same as

Base.acquire(() -> println("Do Some Works From Thread $(threadid())"), sem)

and this method of acquire does the following:

  1. Acquires the semaphore
  2. Calls the function given as first argument
  3. Releases the semaphore – no matter if the function returns normally or throws an error.
    (In case you know Python, the do-notation in Julia is often used in a similar fashion as the with resource managers in Python).

Thus, when using the do-notation, you don’t need to call release explicitly. I.e., you can either write

Base.acquire(sem) do
    println("Do Some Works From Thread $(threadid())")
end
# Note: No explicit release needed

or just use

Base.acquire(sem)
println("Do Some Works From Thread $(threadid())")
Base.release(sem)  # Note: Might not be called if previous line throws an error!

In general, the do-notation is preferred as you cannot forget the release and it also releases when an error is thrown with the semaphore held.

Have not done such low level parallel programming for quite some time, but apparently semaphores can be used to implement higher-level constructs such as barriers (see The little book of semaphores for details). Overall, getting these things correct is rather difficult and generally channels are much easier to use and less error-prone. There is also nothing wrong with channels … what makes you still believe that they are slow? Do you have any benchmarks on that?

1 Like

Thank you for your feedback, and I followed what you suggested.
However, I don know why the resuls are not as per form?

function KSD()
        time_loop = true
        sem = Base.Semaphore(nthreads())
        thrs = [@tspawnat i begin
                while time_loop
                  Base.acquire(sem) do
                    println("Do Some Works From Thread $(threadid())")
                  end 
                end 
         end for i = 2:nthreads()];
        for t in 1:1000
          Base.acquire(sem) do
             println("Do Some Works From Thread $(threadid())")
          end
        end # for t in 1:3
        time_loop = false
       fetch.(thrs);
       return nothing
end # function KSD()
KSD()
Do Some Works From Thread 1
Do Some Works From Thread 1
Do Some Works From Thread 1

I only used channels before and in blog it mentioned that. I will look it up and share it.