How to improve performance in a function that repeatedly defines and multiplies matrices

Globals are only problematic if you access them repeatedly. Note in my code, that I only ever access them once in the moment where I call calc. calc itself does not access globals and GHT and GFC also shouldn’t. That is the very important difference!

@threads is not the only way to get thread-based parallelism and is really only a convenience tool. In your other thread I explained how to use Threads.@spawn together with ChunkSplitters.jl to split the workload into chunks and preallocate workspaces for each chunk.

Addendum: Also if you use things from LinearAlgebra.jl that use BLAS, then this opens another can of worms regarding threads…