Thread-local storage initialization

I have a problem with a couple (possibly three) C/C++ modules that I interface with Julia. Namely, these modules rely on thread-local state that needs to be initialized with thread-specific state before any code is called inside them. The thread-local state contains things such as caches and data that is replicated on a per-thread basis. Storing that state in global variables is either not possible because some third-party code is not equipped to handle the synchronization requirements or because it creates unnecessary contention in a hot loop.

Ideally, I would like to initialize this thread-local state for each thread when the module is being loaded. However, as some of the other threads may be running other jobs at that moment, there does not appear to be an easy way to do that.

The alternative would be to check at each entrypoint for the module whether initialization has already happened, and if not, to call the initialization routine. The problem with that approach is that this is fragile (we’re possibly talking about dozens of public functions in such a module, all of which need to be equipped) and even though this is cheap — as branch prediction is nearly guaranteed to succeed — it can still add significant overhead for short functions.

This is why I’d rather initialize the module upon loading, but I currently see no clean way to do that. (There is a hackish way of sending a signal to each thread and doing the initialization in the signal handler, but not only is this fragile, it limits what can be done during initialization. In particular, there is no safe way to call back to Julia code.)

Any suggestions?

2 Likes

I don’t see what the thread activity has to do with this — you can initialize the thread-local state in the main thread at module-initialization time. Typically, this is done by putting the thread-local data in an array of length Threads.nthreads(), and then each thread accesses the data at index Threads.threadid().

For example, see how Julia implements thread-local RNGs or thread-safe buffers for matrix-multiplication.

1 Like

The problem I am dealing with is that the thread-local state exists in the form of C/C++ thread-local variables, e.g. variables declared with __thread or thread_local. They can only be initialized from within the thread (portably, that is).

2 Likes