Hi everyone,
I am writing a finite element solver. Since the solver takes a lot of time on a single core, I am trying to parallelize it. The solver method has several functional calls which share some cache. This has been done to keep the solver from allocating and to reduce time spent in garbage collection.
For parallelizing this, I am defining an array containing a cache struct for each thread. To prevent any race conditions, I am also tracking the cache used by a thread by maintaining a free_cache boolean array.
The code will look something like this:
chunks = Iterators.partition(eachindex(A), div(length(A), Threads.nthreads())
cache = [Cache{Float64, Int64}() for _ in Threads.nthreads()]
free_cache = fill(Bool(1), Threads.nthreads())
tasks = map(chunks) do chunk
@spawn do_something(A, chunk, cache, free_cache)
result = maximum(fetch.(tasks))
And the function’s defintion looks like:
function do_something(A, chunk, cache, free_cache)
free_idx = findfirst(!=(0), free_cache)
free_cache[free_idx] = 0
### SOME COMPUTATION using cache[free_idx] ####
free_cache[free_idx] = 1
return result
Upon testing, this approach produces the correct result. I wanted to ask if this a good practice and if there are any issues that can occur with this approach?
You mean you want to reuse the caches later again with a new set of tasks? You could e.g. put the Caches into a Channel and each task just take!s one cache from the channel at the start instead of instantiating a new one. After finishing the task put!s it back into the Channel. A channel is thread-safe.