I’ve been running into issues with WignerSymbols.jl, where one of the necessary operations is calculating the prime factorization of certain numbers. To speed this up you’d want to cache any found prime factors, and this cache is exactly what makes it thread unsafe.
As far as I can tell, there are 2 options. Either work with a lock, but that would slow down the existing code, even in situations where it was already called in a threadsafe way. The other way would be to work with thread-local caches (someone tried this local caching by xzackli · Pull Request #13 · Jutho/WignerSymbols.jl · GitHub but memory consumption gets out of hand).
Ideally you would have some array type, for which getindex is fast, and as long as this index is < length(array), it is garantueed to be threadsafe. Modifying the array is allowed to be slower (getting a lock or something). I think a normal array does not have this property, because push! may cause the entire array to be re-allocated somewhere, making concurrent getindex’s unsafe
Thread safe does not necessarily imply faster performance, e.g. threads accessing common data can also cause cache thrashing (invalidating cache lines), worst case making your threaded code slower than sequential.
You could call specialized threadsafe scaling parallel data structures, such as Intel’s TBB, from Julia
A simpler, but potentially unsafe, solution would be a C/C++ fixed size array with compare and swap (in C++ std library).
I think, but can be incorrect, that the cache/shared data in your factorization problem is quite small, in which case a process based solution could be faster (message passing), given that a copy of factors (unless we’re talking huge numbers) is not that expensive. That would avoid threadsafety issues, and leverage Julia’s more mature multiprocessing paradigm.
I guess you want a thread-safe dict? (that is, key-value hash table). C++ has a library called parallel_hashmap, which is essentially a fixed-size vector of hashmap and each hashmap has an individual lock. You can consider make a tuple of LRUcache to achieve similar effects.
I’m terribly sorry, I apparently missed these replies.
it’s not entirely a matter of speeding up the code such that these wignersymbols are calculated in parallel, but it’s rather that other code that really has to be parallelized should be able to interface with wignersymbols in a thread safe way. All ideas we could come up with were threadsafe at the cost of performance in the case where no multithreading gets used.
In the end the solution was to use a list of lists, that can only grow. https://github.com/Jutho/WignerSymbols.jl/blob/master/src/growinglist.jl This has the advantage that as long as the looked up index is smaller then the size of the cached primefactors, we don’t have to acquire a lock to look up the index.
The common usecase is to often reuse the same primefactors, so in almost all cases we don’t have to acquire any locks, and it’s as fast as the single-threaded original code. If we do have to add new factors, then we are guaranteed that concurrent accesses are still threadsafe!