Hi, I’d like to hear people’s thoughts about the use of ccall( (:function, "library"), ...) as opposed to ccall( :function, ...) (see Calling C and Fortran Code · The Julia Language) and whether the former variant should be discouraged or the underlying code changed.
The problem is that specifying the shared-object library is very non-C like and prevents one from using profilers and other tools that rely on LD_PRELOAD to inject code. When writing C or C++, the library for a function is not specified in the source code. The shared-object is dynamically linked into the code at runtime and a function for a particular call is found by searching through the exported functions from all such shared-objects. This method is equivalent to the Julia ccall( :function, ...) variant. If the user has specified a shared-object with the LD_PRELOAD environment variable at runtime, that library’s functions go to the top of the function search list, potentially overriding functions of the same name in other shared-objects. This is a common technique of profiler tools that, say, wrap code around malloc to track memory allocations.
Use of LD_PRELOAD works with Julia so long as you do not specify the library in ccall. See this gist for a simple example.
However, the Julia documentation encourages specifying the library with ccall( (:function, "library"), ...) and so tools that rely on LD_PRELOAD do not work. I find that many Julia packages that call C/C++ functions use this variant of ccall. My specific problem is that I want to use a tool called Darshan to profile HDF5 I/O from my Julia application. Darshan uses LD_PRELOAD to wrap MPI-IO and HDF5 C functions to add counters and timers. To make this work, I have my own versions of MPI.jl and HDF5.jl where I’ve changed the ccall statements to use the non-explicit-library variant (I’ve submitted PRs to both packages … for MPI.jl all ccall statements were already prefixed by a macro and the authors have a PR to change that macro to remove the library argument from the ccall).
I do understand the rationale behind ccall( (:function, "library"), ...) as it loads the shared object for you. I’m wondering if that can be changed so that the library is loaded with the correct options to make LD_PRELOAD work (e.g. on Linux on needs to do Libdl.dlopen(shared_object_file, RTLD_GLOBAL). Or, the documentation be changed to encourage the non-explicit-library variant. Or, at least have a warning that the explicit-library variant has some consequences.
I thought I’d bring this up here before making an issue or a PR to see if anyone else has thought about this concern.
Thanks! – Adam