Hi, I’ve written a wrapper to a C library (GitHub - dforero0896/libpowspec: Library based on Cheng Zhao's powspec code) in here GitHub - dforero0896/Powspec.jl: Julia bindings to Cheng Zhao's powspec C code. . While my initial tests worked, when actually using the tools in my daily research, I find that calling the same function twice with the same data does not work.
Some details. I modified a C code into a library I could compile and then call externally. To do this I made a function that accepts argc and argv (which sounded really simple back then since I wouldn;t have to modify much of the internals of the code). Everything seems to work fine when calling a function (say power_spectrum
) once, that is, line 90 in src/box_periodic.jl
works and auto_output
is properly defined. On the second call, the same variable is no longer properly defined, as calling the external library yields an error, complaining that the “command line argument” is not understood. The first time arround the command to be passed is auto_output = "--auto=[test1.dat, test1.dat]"
, and the second, it is somehow turned into auto_output = "--auto\0[test1.dat\0 test1.dat\0"
. Does someone have any idea about what is happening here?
Thanks in advance
The command line parser is probably mutating the string you send to it, similar to what is happening here:
julia> x = "a=b"
"a=b"
julia> unsafe_store!(pointer(x), '\0', 2)
Ptr{UInt8} @0x00007fde495a0330
julia> x
"a\0b"
But I create the string inside the function call, I would expect changes that may have happened to it in the first call, wouldn’t matter afterwards. I mean, the string is defined as
auto_output = "--auto=[test1.dat, test1.dat]"
@show auto_output
and then the @show
line shows something different each time, even before calling the C function.
Issue seems to be solved by changing
auto_output_ptr = Cstring(pointer(auto_output))`
to
auto_output_ptr = Cstring(pointer(deepcopy(auto_output)))
though I still don’t understand why.
You have to make sure the string object is rooted, .eg. with GC.@preserve
, so that it doesn’t get garbage collected before you are done with the pointer.
1 Like
But what would I need the same pointer for two different function calls if the string object is created inside each of them? My guess would have been that they are different objects completely independent from each other.
Strings are semantically immutable in Julia, so there is no reason to create new copies every time the function is run. You can see that it is at the same memory address in every call:
julia> function f()
x = "abc"
println(pointer(x))
return x
end
f (generic function with 1 method)
julia> f();
Ptr{UInt8} @0x00007f0862c0eb38
julia> f();
Ptr{UInt8} @0x00007f0862c0eb38
Now if someone gets behind the back of the compiler and modifies the memory storing the string anyway, e.g. with unsafe_store!
(which is named unsafe for a reason) or a ccall
(which always is unsafe), you will see that modification in future calls:
julia> function g()
x = "abc"
println(x)
unsafe_store!(pointer(x), 'x', 1)
return x
end
g (generic function with 1 method)
julia> g()
abc
"xbc"
julia> g()
xbc
"xbc"
That’s why it helps to send a copy of the string in your ccall
. The argument parser mutates that copy but the next time you call the function you make a fresh copy of the uncorrupted original string.
1 Like
Since strings are supposed to be immutable in Julia, I wouldn’t pass a String
to a function that mutates the contents, since you don’t know what that will break even if you make a copy. (e.g. there is no semantic requirement that copy
or deepcopy
actually copies immutable data AFAIK.) Safer to just copy the data into a Vector{UInt8}
.
Unfortunately, Vector{UInt8}(mystring)
is not enough here, because you want to (a) include the NUL terminator and (b) make sure that mystring
does not have an embedded NUL. The latter is handled by Base.unsafe_convert(Cstring, mystring)
, so what you can do is:
# get the NUL-terminated data from `s` as a `Vector{UInt8}` including the
# NUL terminator, while checking that `s` does not contain NUL.
cstring_vector(s::String) = GC.@preserve s begin
p = Base.unsafe_convert(Cstring, s) # throws if s contains NUL
copy(unsafe_wrap(Array, Ptr{UInt8}(p), sizeof(s)+1)) # copy the data + NUL
end
which gives:
julia> cstring_vector("foo")
4-element Vector{UInt8}:
0x66
0x6f
0x6f
0x00
julia> cstring_vector("fo\0o")
ERROR: ArgumentError: embedded NULs are not allowed in C strings: "fo\0o"
It might be nice if there were some function in Base to do this kind of copying more conveniently.
PS. And as I said before, passing around all of these raw pointers without rooting the underlying objects is just asking for segmentation faults if you get unlucky with garbage collection. You really need to read the documentation of GC.@preserve
.
PPS. Cstring(pointer(mystring))
is unsafe, independent of rooting issues, because it doesn’t check whether mystring
contains embedded NUL (\0
) characters and hence can’t be treated as a NUL-terminated string. Use Base.unsafe_convert(Cstring, mystring)
instead if you must do this.
1 Like
I see, but does that mean that I should change the C library signature too? So far my c function receives a char *argv[]
and that doesn’t seem to work with your proposed solution.
No. A Vector{UInt8}
produces the same kind of pointer as a String
, because it has the same underlying data format.
I see, but then I should change the way I call it? I did
pk = GC.@preserve argv_vec ccall((:compute_pk, "$(ENV["LIBPOWSPEC_PATH"])/libpowspec_f.so"), Ptr{PK}, (Ref{CATA}, Cint, Cint, Ptr{Cint}, Cint, Ptr{Vector{Uint8}), cat, save_auto & save_cross, false, int_cache, argc, argv)
but the string read is not correct.
It works now. I still have some more issues with calling the C library but I am not sure if I should rather open a new thread. To summarize, I can now call the C function repeatedly and apparently without issues . However, a sort time after the code SegFaults at random points like a Julia pure function that never showed Segfaults earlier or at a savefig
call. I suspect that the GC is trying to collect some memory that the C library has already freed, could that be a possibility?