Multithreading when task vary in length (and need temporary arrays)

I am running simulations on my personal computer. The compute time of each draw/specification can vary significantly (in an unpredictable way). I think Threads.@threads schedules each iteration of a loop between threads beforehand disregarding how long each is going to take. Is there a better, dynamic way to assign new tasks to threads when they become available?

Also, some simulations require temporary arrays. I have been allocating new arrays for each new task. But being able to pre-allocate temporary arrays for each thread would seem better. I was considering using something like Threads.threadid(), but I read if is now discouraged because the thread ids can change unpredictably. Is there a better approach?

1 Like

In general, I would use Threads.@spawn instead of Threads.@threads on later Julia versions. It gives the scheduler more flexibility, and can handle things like uneven task length pretty well.

One way to handle array reuse (assuming the arrays have fixed size) would be to spawn a fixed set of workers, and give each worker its own array. The key point is that an array is owned and modified by a single Julia Task, which makes it thread-safe. A rough outline might look like this

# Represents the input needed for a simulation
struct SimulationInput
  ...
end

channel = Channel{SimulationInput, 32}

function producer(channel)
  simulation_input = generate_simulation_input()
  put!(channel, simulation_input)
end

function worker(channel, buffer_size)
  buffer = zeros(buffer_size)
  for simulation_input in channel
    run_simulation(simulation_input, buffer)
  end
end

Threads.@spawn producer(channel)
workers = [Threads.@spawn worker(1000) for i in 1:8]
1 Like

Also have a look at GitHub - JuliaFolds2/OhMyThreads.jl: Simple multithreading in julia which has some nice multithreading constructs and a very nice documentation and set of examples

2 Likes

Thanks a lot for the suggestions. Threads.@spawn is exactly what I was looking for and it worked perfectly. To understand better how it works, I also found these notes very useful. For the allocation problem, I didn’t go that far into asynchronous programming yet to keep the changes to my old code as light as possible. Instead, I chose to split the work in batches of N draws each (where N > total number of threads available) and pre-allocate N versions of the temp arrays needed. Then I compute each batch one after the other.

1 Like

The nice thing about the channel solution is that the number of batches, i.e., work-items, and number of workers, i.e., Threads, can be chosen independently of each other. Since each worker is running sequentially, you only need a buffer per worker which can then be reused for different batches. For compute-bound tasks you probably want something like N > # workers ≈ # number of cores. Then, preallocate # workers many temp buffers and play with the number of batches to optimize the trade-off between per batch overhead and load balancing.
To get an idea of what can be done with channels, I got a lot out of the talk by Rob Pike on Go Concurrency Patterns.

2 Likes

FWIW, there’s also ChunkSplitters.jl for this.

1 Like