Different behaviours in Linux and MacOs with Julia embedded in C++

embedding
linux

#1

Hello everyone,
I am currently embedding Julia inside of some cross-platform C++ shared libraries that will be loaded by an already developed application with calls to dlopen(). I am experiencing different behaviours with MacOS and Linux. In MacOS, my shared libraries get loaded correctly, while on Linux receive errors from dlopen() not finding Julia libraries that are not libjulia.so (I am going to explain it better with the example below). I think that the problem relies on how linking and -Wl,--export-dynamic work in Linux, but I am not sure.

Here is the example:
Consider a simple test executable, called DummyJuliaExe, that will load a shared library linked against Julia called libTestCJulia and call a function testFunction().
The two source codes, DummyJuliaExe.cpp (the executable source code) and testCJulia.cpp (the .so source code) are in the same folder.

The DummyJuliaExe.cpp source code is as follows, and it is compiled with g++ with the command:
g++ -o DummyJuliaExe DummyJuliaExe.cpp -std=c++11 -fPIC -Wl,-rpath,'.' -ldl

#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>

int main(void)
{
    void *handle;
    char *error;

    handle = dlopen("libTestCJulia.so", RTLD_LAZY);
    if (!handle) {
        fprintf (stderr, "%s\n", dlerror());
        exit(1);
    }

    dlerror();

    typedef void (*function_call_t)();
    function_call_t testFunction = (function_call_t) dlsym(handle, "testFunction");

    if ((error = dlerror()) != NULL)  {
        fprintf (stderr, "%s\n", error);
        exit(1);
    }

    testFunction();

    dlclose(handle);

    return 0;
}

The libTestCJulia.so is compiled from this source code, using g++, with these two commands (change the path to Julia to your machine’s):

g++ -c testCJulia.cpp -std=c++11 -fPIC -I'/home/francesco/Sources/julia-native/usr/include/julia'
g++ -shared -o libTestCJulia.so testCJulia.o -std=c++11 -fPIC -L'/home/francesco/Sources/julia-native/usr/lib' -Wl,--export-dynamic -Wl,-rpath,'/home/francesco/Sources/julia-native/usr/lib' -Wl,-rpath,'/home/francesco/Sources/julia-native/usr/lib/julia' -ljulia

testCJulia.cpp:

#include <julia.h>

extern "C" void testFunction()
{
    jl_init();

    jl_eval_string("println(sqrt(2))");

    jl_atexit_hook(0);
}

When running the executable, ./DummyJuliaExe, on Linux I get this error:

fatal: error thrown and no exception handler available.
InitError(mod=:Sys, error=ErrorException("could not load symbol "jl_cpu_threads":
./DummyJuliaExe: undefined symbol: jl_cpu_threads"))
rec_backtrace at /home/francesco/Sources/julia-native/src/stackwalk.c:94
record_backtrace at /home/francesco/Sources/julia-native/src/task.c:246 [inlined]
jl_throw at /home/francesco/Sources/julia-native/src/task.c:577
jl_errorf at /home/francesco/Sources/julia-native/src/rtutils.c:77
jl_dlerror at /home/francesco/Sources/julia-native/src/dlload.c:74 [inlined]
jl_dlsym at /home/francesco/Sources/julia-native/src/dlload.c:228
jlplt_jl_cpu_threads_15689 at /home/francesco/Sources/julia-native/usr/lib/julia/sys.so (unknown line)
__init__ at ./sysinfo.jl:104
jl_apply_generic at /home/francesco/Sources/julia-native/src/gf.c:2184
jl_apply at /home/francesco/Sources/julia-native/src/julia.h:1537 [inlined]
jl_module_run_initializer at /home/francesco/Sources/julia-native/src/toplevel.c:90
_julia_init at /home/francesco/Sources/julia-native/src/init.c:813
julia_init at /home/francesco/Sources/julia-native/src/task.c:302
jl_init_with_image at /home/francesco/Sources/julia-native/src/jlapi.c:53
jl_init at /home/francesco/Sources/julia-native/src/jlapi.c:81
testFunction at ./libTestCJulia.so (unknown line)
main at ./DummyJuliaExe (unknown line)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
_start at ./DummyJuliaExe (unknown line)

It looks like the libTestCJulia.so shared library can’t find all the other Julia libraries it needs to link against. Thus, it cannot resolve the jl_cpu_threads symbol.

On the other hand, with the same source code compiled on MacOs (and the same Julia version, 1.0.2), Julia boots correctly in the shared library and I get the correct output of 1.4142135623730951. The compiler and linker flags for MacOS are the same, with the exception of removing Wl,--export-dynamic. The Julia version I am using is simply 1.0.2 with JL_THREADS=0 in Make.user. I get the same errors with the distributed binaries release with threading enabled.

Linux distro is Manjaro 18.0.0, with 4.14.83-1-MANJARO kernel. MacOs version is HighSierra 10.13.6.

I don’t know if it is just a problem of compiler and linker flags needing to be different on Linux, or a Julia bug.


#2

Try

handle = dlopen("libTestCJulia.so", RTLD_NOW | RTLD_GLOBAL);

This was in 0.6, don’t know if things have changed. There may be other issues as well, but that was one thing I had to do after building. I ended up using /share/julia/build_sysimg.jl to build my .so. You might want to look in there to see if there are any other flags you need.


#3

Thanks for the reply. It actually works with handle = dlopen("libTestCJulia.so", RTLD_NOW | RTLD_GLOBAL);. The problem is that what I posted here is just an example of loading the .so the same way that the already compiled application that I am building my .so for would do. This means that I don’t have access to that source code to modify it. To be clearer I just want to mention that I am linking my compiled .so to Julia, while the application I am targeting doesn’t know anything about Julia. The handling of any Julia stuff is done through my compiled .so. Though, what bugs me is that I don’t get why the same thing would work on MacOS and not on Linux. I am probably missing some flags…
Anyway, if I understood correctly, isn’t build_sysimg.jl just a way to precompile Julia code in the Julia’s sys.so file?


#4

Ok, that’s good your symbols are found. Sounds like we have a similar situation. I wrote a plugin for Maya, so I didn’t have control over the dlopen flags either as the plugin was autoloaded. The fix was to write a “loader” plugin whose job was to dlopen the julia-based plugin via dlopen with RTLD_GLOBAL namespace. Then there was some messy business with looking up C++ mangled methods so the loader could call init on the loaded plugin.
I used this a reference:

This is a little brittle. Julia devs, any reason for the RTLD_GLOBAL namespace being required?


#5

Second part of your question. I used build_sysimg.jl to precompile my julia code into an so. This .so also had @Base.ccallable function in it, which was my main julia hook. So in the end I had 3 .so’s The loader, the plugin and the precompiled julia .so.


#6

I understand. I would prefer not to go this route since I have already got my code to work perfectly on MacOS by simply loading a single .so, and I would like to do the same for Linux. I feel like there must be some linking flags here that would help the case. I mean, why would dlopen without the RTLD_GLOBAL namespace work on MacOS and not on Linux when loading the .so’s?


#7

As I understand it, build_sysimg.jl is moderately broken on >= 0.7. See https://github.com/JuliaLang/julia/pull/27629 for a proposed fix, but note the last comment:

I think we should just remove contrib/build_sysimg.jl and tell users to use PackageCompiler.jl instead. All in favor?


#8

what build_sysimg.jl is trying to do suites our toolchain a bit better. When I looked at PackageCompiler it seemed like I’d need to bake in which c compiler I used at “install” time. The compiler/version we use changes depending on what software we are building, so I can’t enforce “system gcc” on everyone.


#9

Be sure to comment on the issue then.


#10

I don’t know about macOS, but the documentation https://docs.julialang.org/en/latest/manual/embedding/ says:

Currently, dynamically linking with the libjulia shared library requires passing the RTLD_GLOBAL option. In Python, this looks like:

>>> julia=CDLL('./libjulia.dylib',RTLD_GLOBAL)

It mentions .dylib which seems to imply that it should be the case in macOS (although the document can simply be older than the implementation).


#11

Thanks for the clarification. I have also found this on the documentation:

https://docs.julialang.org/en/v1/stdlib/Libdl/index.html :

On MacOS the default dlopen flags are RTLD_LAZY|RTLD_DEEPBIND|RTLD_GLOBAL while on other platforms the defaults are RTLD_LAZY|RTLD_DEEPBIND|RTLD_LOCAL.

Could it then mean that the problem I am experiencing is in the inner Julia call to dlopen to load other Julia .so’s, which on Linux appear to be defaulted to RTLD_LAZY|RTLD_DEEPBIND|RTLD_LOCAL, while on MacOS to RTLD_LAZY|RTLD_DEEPBIND|RTLD_GLOBAL?

EDIT: I have looked at the source code for dlload.c, Libdl.jl and julia.h, but I can’t find where these flags are set for MacOS and Linux to change them. Can someone point me to the right direction?


#12

That’s Libdl.jl documentation so I don’t think that’s relevant for using dlopen from dlfcn.h.

I know it’s ugly but maybe you can pre-load libjulia using LD_PRELOAD environment variable when invoking the host executable?

Alternatively, not sure if it works, but how about creating a shim library libShimTestCJulia.so that dlopens actual library libTestCJulia.so with RTLD_GLOBAL?


#13

This is exactly what I ended up doing, and it works. It is not the most elegant solution, but at least it is something working both on MacOS and Linux.


#14

Nice. Good know that it works.