Conditional Multithreading

Is there a way to enable/disable calls to Threads.@threads?

I’m developing an optimization solver, and when the problem is sufficiently small, using multi-threading hurts performance, and incurs allocations (my code doesn’t make any allocations until I wrap it in @threads). However, when the problem size grows and the individual operations being threaded become more computationally intensive (booting the arithmetic intensity) I get some good performance increases.

I’d like to be able to enable/disable multi-threading as a user-specified option. I can’t just restart Julia with JULIA_NUM_THREADS=1, since @threads still allocates memory and is slower than had I left it out. Can I do this using metaprogramming?

I know I can obviously just create duplicates of my functions and wrap some with @threads, but I’d rather avoid this if possible.

FYI: nearly all of my functions are trivially parallelizable, like this

function foo(vals, vars)
    for k in eachindex(vals)
        vals[k] = somefunction(vars[k])
    end
end
1 Like

There was a PR to remove the overhead of threads when a single thread is used, but it was closed


see also

Awesome, thanks for linking those.

So, to be clear, is there not an easy way to do this at run-time?

For example, I can do this:

function run_kernel(vals,A,b,parallel=true)
    if parallel
        Threads.@threads for k in eachindex(vals)
            vals[k] = mykernel(A,b)
        end
    else
        for k in eachindex(vals)
            vals[k] = mykernel(A,b)
        end   
    end
end

Is there not a way to do this without copying the code like that?

This sounds like it’d be better to dispatch on the problem size than nthreads() == 1. If that’s the case, it’s better to use threaded map that supports specifying base case size. It’s kind of a plug, but Transducers.jl has it. It would be something like:

xf = Map() do k
    vals[k] = mykernel(A, vars[k])
    nothing
end

foldl(right, xf, eachindex(vals))  # sequential
reduce(right, xf, eachindex(vals))  # parallel
reduce(right, xf, eachindex(vals), basesize=10)  # parallelize if length(vals) > 10

(I used side-effect in Map which is a bit nasty. It’d be better to use collect and tcollect but they allocates.)

1 Like

Is there not a way to do this without copying the code like that?

I have an @onthreads macro in ParallelProcessingTools.j that does this for you. It also pins the tasks to threads though, legacy from pre-partr times (I’ll change that in the future, or make it more flexible at least). One of the motivations for @onthreads was easy testing of thread-scaling of code.

3 Likes