Multi-threading appears to be single thread when some threads cost much more time than the others?

WellWellww · July 8, 2023, 4:13pm

Hi, I have the following code section with millions of loops using multi-threading.

global result_list = Vector{Union{Nothing,SparseVector{Float64,Int64}}}(nothing, tot_num)

Threads.@threads for element in element_list
    ind = element_to_ind_Dict[element] # get the index
    @show ind 
    result = function_of_element(element) # the main calculation
    if result === nothing
        continue
    end
    @show result

    global result_list
    result_list[ind] = result 
    
end

I know that in some loops the function_of_element() costs much longer time than the others. Although there are millions of loops waiting for being calculated, it still looks like I am using only one thread for most of the time.

The htop gives the following

where the main thread is at the status of Interruptible Sleep.

Is there any method to avoid this and fully use the resource? Thanks a lot!

gdalle · July 8, 2023, 6:37pm

Did you start Julia with several threads? What do you see when you run Threads.nthreads()?

Ralph_Smith · July 8, 2023, 7:08pm

@threads for simply divides the serially ordered work into contiguous chunks and assigns those chunks to nthreads() tasks. So if chunk N has most of the big iterates, its task will keep running long after the others are done. A fairer alternative is to start up nthreads() worker tasks which take work items from a Channel. If you know in advance which iterates are most costly, they should probably go into the queue first.

gdalle · July 8, 2023, 7:34pm

I thought in recent Julia versions the thread scheduling was dynamic?

carstenbauer · July 8, 2023, 9:05pm

dynamic != load balancing. We don’t have a load balancing scheduling option for @threads yet.

gdalle · July 8, 2023, 9:07pm

Can you elaborate on the difference?

If this is true, why are the other threads idle even though there is yet work to be done?

Ralph_Smith · July 8, 2023, 9:08pm

“Dynamic scheduling” in this context means that once a task has been defined, different portions may execute on different threads (sequentially, unless it has sub-tasks). The logic for building tasks in base/threadingconstructs.jl seems to have been the same for quite a while.

gdalle · July 8, 2023, 9:17pm

Is there a reference in the docs to understand this a little better? My quick search yielded nothing

Ralph_Smith · July 9, 2023, 1:31am

The current situation is apparently an “implementation detail”, and therefore (!) not well documented. The manual explicitly says that the @threads body “must not make any assumptions about the distribution of iterations to tasks …,” which leads some of us to prefer other constructs.

Elrod · July 9, 2023, 2:02am

using ThreadsX

function calc_result_list(element_list, element_to_ind_Dict, tot_num)
  result_list = Vector{Union{Nothing,SparseVector{Float64,Int64}}}(nothing, tot_num)

  ThreadsX.foreach(element_list, basesize=1) do element

    ind = element_to_ind_Dict[element] # get the index
    @show ind 
    result = function_of_element(element) # the main calculation
    if result === nothing
        return
    end
    @show result

    result_list[ind] = result 
  end
  return result_list
end

ThreadsX allows you to pass a basesize. Using a basesize=1 means tasks work on 1 iteration at a time, rather than large chunks.
The above implementation also avoids globals.

Ralph_Smith · July 9, 2023, 2:58am

By the way, if you really have many iterations, you should drop the @show lines which block for access to the IO event loop.

WellWellww · July 9, 2023, 11:23am

Hi, thanks a lot for your explanations. May I ask what the “other constructs” are?

Ralph_Smith · July 10, 2023, 3:36am

Re: other constructs
In addition to Chris Elrod’s example above, there are some good ones in the Transducers/FLoops packages and no end of (potentially dangerous) things one can do oneself with @spawn.

carstenbauer · July 10, 2023, 9:53am

The task structure of @threads :dynamic and @threads :static is actually the same, i.e. both create O(nthreads()) tasks corresponding to contiguous regions of the given (potentially large) iteration range. The only difference is that tasks can migrate (“are non-sticky”) in the former case while they can’t in the latter. Compare this to @sync for ... @spawn ... which create one task per loop iteration and gives a form of load-balancing through Julias task scheduler. In pictures:

Note that neither :static nor :dynamic gives load balancing (as @spawn does). Also note that the task->thread mapping isn’t fixed for :dynamic but is for :static. However, when we sort by workload we see that eventually they do the same thing. So, to summarize, which Julia thread does which chunk is dynamically decided of :dynamic but the chunks are the same as for :static.

(Pluto notebook: load_balancing.jl (45.8 KB) - Be aware though that I use hacky/unsafe threadid() pattern here for simplicity.)

You might want to check out these comments by @tkf:

and the comments in these PRs:

Topic		Replies	Views
Question about optimal thread allocation for vector of problems of differing sizes Performance multithreading	7	1895	January 17, 2020
Behavior of `Threads.@threads for` loop General Usage multithreading	30	5194	February 14, 2022
Threads.@threads scheduling puzzle Performance parallel	2	391	December 9, 2021
Understanding nested Threads.@threads scheduling General Usage	4	1890	May 29, 2020
@threads is really working in parallel? Julia at Scale multithreading	4	660	June 28, 2019

Multi-threading appears to be single thread when some threads cost much more time than the others?

Related topics