Multithreading when task vary in length (and need temporary arrays)

nicolas · June 15, 2024, 4:23pm

I am running simulations on my personal computer. The compute time of each draw/specification can vary significantly (in an unpredictable way). I think Threads.@threads schedules each iteration of a loop between threads beforehand disregarding how long each is going to take. Is there a better, dynamic way to assign new tasks to threads when they become available?

Also, some simulations require temporary arrays. I have been allocating new arrays for each new task. But being able to pre-allocate temporary arrays for each thread would seem better. I was considering using something like Threads.threadid(), but I read if is now discouraged because the thread ids can change unpredictably. Is there a better approach?

Satvik · June 15, 2024, 4:35pm

In general, I would use Threads.@spawn instead of Threads.@threads on later Julia versions. It gives the scheduler more flexibility, and can handle things like uneven task length pretty well.

One way to handle array reuse (assuming the arrays have fixed size) would be to spawn a fixed set of workers, and give each worker its own array. The key point is that an array is owned and modified by a single Julia Task, which makes it thread-safe. A rough outline might look like this

# Represents the input needed for a simulation
struct SimulationInput
  ...
end

channel = Channel{SimulationInput, 32}

function producer(channel)
  simulation_input = generate_simulation_input()
  put!(channel, simulation_input)
end

function worker(channel, buffer_size)
  buffer = zeros(buffer_size)
  for simulation_input in channel
    run_simulation(simulation_input, buffer)
  end
end

Threads.@spawn producer(channel)
workers = [Threads.@spawn worker(1000) for i in 1:8]

disberd · June 15, 2024, 5:58pm

Also have a look at GitHub - JuliaFolds2/OhMyThreads.jl: Simple multithreading in julia which has some nice multithreading constructs and a very nice documentation and set of examples

nicolas · June 16, 2024, 12:44am

Thanks a lot for the suggestions. Threads.@spawn is exactly what I was looking for and it worked perfectly. To understand better how it works, I also found these notes very useful. For the allocation problem, I didn’t go that far into asynchronous programming yet to keep the changes to my old code as light as possible. Instead, I chose to split the work in batches of N draws each (where N > total number of threads available) and pre-allocate N versions of the temp arrays needed. Then I compute each batch one after the other.

bertschi · June 16, 2024, 7:16am

The nice thing about the channel solution is that the number of batches, i.e., work-items, and number of workers, i.e., Threads, can be chosen independently of each other. Since each worker is running sequentially, you only need a buffer per worker which can then be reused for different batches. For compute-bound tasks you probably want something like N > # workers ≈ # number of cores. Then, preallocate # workers many temp buffers and play with the number of batches to optimize the trade-off between per batch overhead and load balancing.
To get an idea of what can be done with channels, I got a lot out of the talk by Rob Pike on Go Concurrency Patterns.

lmiq · June 16, 2024, 9:10am

FWIW, there’s also ChunkSplitters.jl for this.

Topic		Replies	Views
Question about optimal thread allocation for vector of problems of differing sizes Performance multithreading	7	1890	January 17, 2020
How to create private arrays for each thread before a loop? General Usage multithreading	2	983	February 21, 2020
Multithreading, preallocation of objects, threadid() and task migration General Usage multithreading	18	1949	September 6, 2021
Pattern for managing thread local storage? General Usage question , multithreading	5	2178	May 17, 2021
Strategies for parallelization of fast non-uniform tasks Performance parallel	12	659	November 16, 2021

Multithreading when task vary in length (and need temporary arrays)

Related topics