Effect of `JULIA_DEFINE_FAST_TLS` and how to handle in shared library or Fortran

According to Embedding Julia · The Julia Language, one should define JULIA_DEFINE_FAST_TLS in any program that embeds calls to Julia. The only documentation for this I could find is the attached inline comment (line break for easier readability):

JULIA_DEFINE_FAST_TLS // only define this once, in an executable
                      // (not in a shared library) if you want fast code.

With libtrixi we are developing a C library with Fortran wrappers to embed Trixi.jl simulations in a Fortran framework. My questions are thus,

  1. What is the effect of using this macro in the source code (or not) in terms of performance? What does it mean to have “fast code”? Are there areas where it is crucial to have this, or, alternatively, where it’s effects are negligible?
  2. How can I use/activate this in a shared library?
  3. How can I use this from Fortran?

Right now, linking to libtrixi.so does not require linking to Julia directly, thus asking everyone who wishes to use libtrixi to have this definition in their code would impose the additional overhead of having to include the Julia headers. Is there an alternative, e.g., a function call that can to be done once to initialize something?

1 Like

From a discussion on the Julia Slack workspace with @gbaraldi and Cody Tapscott, I got the following information that I would like to preserve and share with others who face similar questions:

  • The macro JULIA_DEFINE_FAST_TLS is only required on non-Darwin, non-Windows systems.
  • It provides a more sophisticated (read: faster) implementation of “thread-local storage” (TLS).
  • If activated, on these systems, some/most of Julia’s internals work faster in case they are multithreaded or if being called in a multithreaded context (such as allocations, GC in general, task management etc.)
  • One restriction of the used TLS model (there are multiple that can be used; we use the fastest but most restricted one) is that it requires to know the location of the thread storage statically at link time. That’s why it cannot live in a shared library but only in the main executable (directly, or as a static library).

For libtrixi, we solved this issue by creating a file that only contains the macro call, tls.c, and then provide it as an object library via CMake.

For completeness, please also note that @jameson mentioned that this will not be necessary after Julia v1.10 and that it might be removed from public headers.

cc @Benedict