Hello everyone,

this is my first post (although not my first encounter with Julia), so please bear with me if I am doing everything wrong.

Okay, so now here is my Problem. I have a working code, that solves a set of coupled differential equations. For small system sizes it does so very efficiently, but for large systems the number of derivatives to compute grows very large arising the need for parallelization. Fortunately each element of the set of equations can be computed independently of the other. In principle my codes spends most of its time in the following code snippet

```
function computeStep!(Derivative :: MyStruct1, Parameters :: MyStruct2) :: Nothing
for Iterator in Parameters.Iterators1
Derivative.Kernel1[Iterator] = computeKernel1(Iterator, Parameters)
end
for Iterator in Parameters.Iterators2
Derivative.Kernel2[Iterator] = computeKernel2(Iterator, Parameters)
end
end
```

The Kernels are then updated using the derivatives struct and standard RK4. I have implemented a parallelization using MPI.jl (which works on my remote machine, but should also for multiple nodes), but since the Kernels are memory hungry, distributed memory seems unfeasable. Instead I was tempted to use Julias multithreading tool via the @threads macro and only use MPI between different nodes (making use of the shared memory of the nodes). However performance for multithreaded loops is roughly a factor ~1.5 lower than for a single threaded loop. Memory consumption also grows exceedingly large. Even if I wrap the two different Kernel loops into separate functions the issue does not disappear. I also tried are more fine grained split of the iterator sets, such that each thread operates only on some part of the Kernel - also without any success.

My current Julia version is 1.0. For benchmarking I used the BenchmarkTools and the @benchmark macro.

Any ideas what is going wrong here?

EDIT: The workaround seems to be the usage of FastClosures.jl. May anyone give an example of how to use this for the example given above where the loops are decorated with `Threads.@threads`

?