One trick to get to background-only threads

tro3 · January 10, 2020, 12:00am

Currently in the v1.3(.1) threading implementation, tasks are assigned to a random thread, and that thread might be the primary thread. This can be problematic in a couple of use cases, for example a GUI that suddenly slows down because a heavy-computation task just got assigned to the same thread. I’ve found that any attempts at active thread management get impeded by the primary-thread assignments.

I have not found a way to get Threads.@spawn off of the primary, but I did find a way to do it for Threads.@threads (which uses a very different implementation than @spawn).

@threads divvies up the loop it was assigned to by creating a function that handles a portion of the iterables based on the running threadid. The C-level code called takes that function and assigns it once to each thread. This does not provide any real-time management, but it does keep each thread working on exactly one thing at a time.

Using this mechanism, then, all one has to do is change the generated function to not handle any iterables if on the main thread, and handle all the others elsewhere. The following is a modification of the Threads._threadsfor macro function

function _bgthreadsfor(iter,lbody)
    lidx = iter.args[1]         # index
    range = iter.args[2]
    quote
        local threadsfor_fun
        let range = $(esc(range))
        function threadsfor_fun(onethread=false)
            r = range # Load into local variable
            lenr = length(r)
            # divide loop iterations among threads
            if onethread
                tid = 1
                len, rem = lenr, 0
            else
                tid = Threads.threadid()
                tid == 1 && return                        # Mod - Keep execution off of the main thread
                len, rem = divrem(lenr, nthreads()-1)     # Mod - Divide iterables over one less thread 
            end
            # not enough iterations for all the threads?
            if len == 0
                if tid > rem
                    return
                end
                len, rem = 1, 0
            end
            # compute this thread's iterations
            f = 1 + ((tid-2) * len)                       # Mod
            l = f + len-1                                 # Mod
            # distribute remaining iterations evenly
            if rem > 0
                if (tid-1) <= rem                         # Mod
                    f = f + tid-2                         # Mod
                    l = l + tid-1                         # Mod
                else
                    f = f + rem
                    l = l + rem
                end
            end
            # run this thread's iterations
            for i = f:l
                local $(esc(lidx)) = Base.unsafe_getindex(r,i)
                $(esc(lbody))
            end
        end
        end
        if Threads.threadid() != 1
            # only thread 1 can enter/exit _threadedregion
            Base.invokelatest(threadsfor_fun, true)
        else
            ccall(:jl_threading_run, Cvoid, (Any,), threadsfor_fun)
        end
        nothing
    end
end

Then modifying the @threads macro:

macro bgthreads(args...)
    na = length(args)
    if na != 1
        throw(ArgumentError("wrong number of arguments in @threads"))
    end
    ex = args[1]
    if !isa(ex, Expr)
        throw(ArgumentError("need an expression argument to @threads"))
    end
    if ex.head === :for
        return _bgthreadsfor(ex.args[1], ex.args[2])  # Mod
    else
        throw(ArgumentError("unrecognized argument to @threads"))
    end
end

To test this, I bolted on some time-logging to the tasks in my current project. There are 3,421 queries, and total running time is just under 30 seconds. The following shows the number of tasks running on each thread at 1-second intervals, using the normal @threads:

Thread 1: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
Thread 2: 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Thread 3: 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Thread 4: 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Thread 5: 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Thread 6: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Thread 7: 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Thread 8: 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

That same set of queries run under @bgthreads:

Thread 1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Thread 2: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
Thread 3: 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Thread 4: 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Thread 5: 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Thread 6: 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Thread 7: 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Thread 8: 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Note that Thread 1 is untouched.

Also note, though, that the running time is not reduced. In this use case, 10% of the queries take 90% of the time, and clearly the Thread 2 batch (which used to be the Thread 1 batch) has some of the long ones. So here, @spawn would be better, but it forces a return to Thread 1:

Thread 1: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0
Thread 2: 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0
Thread 3: 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
Thread 4: 3 1 1 1 2 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
Thread 5: 3 6 6 6 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
Thread 6: 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0
Thread 7: 2 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
Thread 8: 1 5 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 0 0 0 0 0 0 0

and is random:

Thread 1: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
Thread 2: 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
Thread 3: 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
Thread 4: 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 2 1 1 1 1 1 0 0 0 0 0 0 0 0 0
Thread 5: 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
Thread 6: 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 2 1 0 0 0 0 0 0 0 0 0
Thread 7: 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 5 1 1 1 1 0 0 0 0 0 0 0 0 0
Thread 8: 4 4 4 4 4 4 4 4 4 2 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0

The ideal case would be a Threads.@spawnat, to allow active task assignment managing. Hopefully in the future.

xiaodai · August 17, 2020, 6:45am

can this be put into a package somewhere?

tro3 · August 21, 2020, 1:43am

Sorry for the slow repsonse. Julia has stopped sending me emails for some reason.

In any case - your wish is my command: GitHub - tro3/ThreadPools.jl: Improved thread management for background and nonuniform tasks in Julia. Docs at https://tro3.github.io/ThreadPools.jl

xiaodai · August 21, 2020, 3:22am

Thanks. I found it and I have been using it to do something silly

Topic		Replies	Views
[ANN] ThreadPools.jl - Improved thread management for background and nonuniform tasks Package Announcements	50	4041	February 29, 2020
Question about optimal thread allocation for vector of problems of differing sizes Performance multithreading	7	1915	January 17, 2020
[ANN] ThreadPools.jl v1.0.0 - Improved thread management for background and nonuniform tasks Package Announcements	9	1383	November 28, 2023
What is julia doing with your threads? General Usage	23	1284	February 21, 2024
In multithreading, how to make each thread pick the first/smallest available element? General Usage parallel , multithreading	13	954	January 10, 2023

One trick to get to background-only threads

Related topics