Calling code from a library (.so): works on one system, fails on another


#1

I am trying to call a function from a compiled library (a .so file). As a simplified example:

 handle = ccall((:foo, "path/to/my_lib1.so"), Ptr{Void}, (UInt32, Ptr{UInt8}), 100, "path/to/my_lib2.so")

on one system, 64 bit ubuntu, it works exactly as expected.

On another system (a 64-bit centos system with generally outdated packages and a 2.6.32 kernel), I get an error:

Library failed to open: path/to/my_lib2.so
Could not find func_bar: Exception in my_function: general

(Note that it has trouble calling a different function than the one originally called by ccall, and from the provided library, my_lib2.so.)

Does anyone have any ideas on what could cause this difference? (Unfortunately, I do not have the source for either of the .so files). I don’t have root on the centos system, so I can’t just upgrade everything, but I could potentially upgrade any specific packages (probably not the kernel). I could also potentially request that the .so files be recompiled with different flags or something.

Anyone have any suggestions on how I might debug?

Any help would be much appreciated. I need this to work. Thanks.


#2

To be clear, the two systems have near-identical directory structure and the path to the libraries is correct on both systems. (I’ve triple checked.)


#3

Dynamically linked libraries may depend on other dynamic libraries. The dependencies are resolved by the dynamic linker when the library is loaded, and the internal function pointers in the library are replaced to point to the correct function in the target library. However, such a dependency might not be resolved if the target library is not found, but an error would typically only then be raised if in fact a function is called that depends on such other libraries.

This seems to be happening in your case.

One way to tell the dynamic linker where to find other libraries is the environment variable LD_LIBRARY_PATH, which is essentially the counter-part to the executable search path PATH.

To check which other libraries your library depends on, type

ldd /path/to/my_lib1.so

in the shell. This should give at least one line with an unresolved dependency. You may also check the same command from within the Julia REPL. Type exactly the same, but prepend a semicolon ; to enter the shell mode.

An example would be for libopenblas with all other libraries being correctly resolved:

ldd /usr/lib/libopenblas.so.0
	linux-vdso.so.1 =>  (0x00007ffcb52ca000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd472e24000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd472c06000)
	libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007fd4728d5000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd47250e000)
	/lib64/ld-linux-x86-64.so.2 (0x00005590dab09000)
	libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007fd4722ce000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd4720b5000)

Then simply locate the missing library manually (e.g. by typing locate my_lib2) and set the LD_LIBRARY_PATH prior to starting Julia. And don’t forget the export when using bash. This can also be done with just one line, prepending the environment variable to be set for said command, viz.

LD_LIBRARY_PATH=/path/to/my_lib julia

How to best resolve this issue permanently depends on your specific case, and how and by whom those libraries were built.


#4

Thanks for the detailed reply. It was helpful.

In this case, if you look at the ccall the full path to the my_lib2.so is passed to the first. I suspect it’s doing a dlopen and should be using the full provided path. In any case, unfortunately, changing the LD_LIBRARY_PATH did not fix anything.

I did get it to work, after much effort. I believe the problem was caused by the use of an old glibc or possibly some other system libraries residing on the system where the problem occurred. I ended up installing gentoo prefix (rap version) (like a self-contained prefixed root directory, importantly, in this case, with up-to-date libraries), then recompiling julia to link to libraries within the gentoo prefixed environment (which are significantly newer than the ones in this system’s normal system library directories) and rebuilding all the packages. Then I ran my julia within the prefixed environment, and it worked! (Some of those steps were probably not necessary, but that’s what I did.)

If anyone understands exactly what might have been the problem, I’d still be very curious to hear.