I have a problem with a couple (possibly three) C/C++ modules that I interface with Julia. Namely, these modules rely on thread-local state that needs to be initialized with thread-specific state before any code is called inside them. The thread-local state contains things such as caches and data that is replicated on a per-thread basis. Storing that state in global variables is either not possible because some third-party code is not equipped to handle the synchronization requirements or because it creates unnecessary contention in a hot loop.
Ideally, I would like to initialize this thread-local state for each thread when the module is being loaded. However, as some of the other threads may be running other jobs at that moment, there does not appear to be an easy way to do that.
The alternative would be to check at each entrypoint for the module whether initialization has already happened, and if not, to call the initialization routine. The problem with that approach is that this is fragile (we’re possibly talking about dozens of public functions in such a module, all of which need to be equipped) and even though this is cheap — as branch prediction is nearly guaranteed to succeed — it can still add significant overhead for short functions.
This is why I’d rather initialize the module upon loading, but I currently see no clean way to do that. (There is a hackish way of sending a signal to each thread and doing the initialization in the signal handler, but not only is this fragile, it limits what can be done during initialization. In particular, there is no safe way to call back to Julia code.)
Any suggestions?