This is an awful lot of trouble to go to call the strtok function. Is that just a proxy for something more complex? If you literally just need strtok then it’s not very hard to write a Julia version of the function that handles Unicode correctly and works on any string type.
There is, however, no reason why passing an array of bytes to C will only handle ASCII strings. UTF-8 is also represented by arrays of bytes and as long as the separators are ASCII, it will work just fine. If you need to handle Unicode, on the other hand, you’re better off implementing this in Julia.
Regarding the correct use of ccall, you do not need a global to prevent deallocation. Yes, a global will not be deallocated. But a local variable that is used after the ccall will also not be deallocated. Or if you wrap a local in GC.@preserve then it also will not be deallocated. You’re now doing all three which is significant overkill.
But again, relying on C for simple sting splitting is probably not the way to go.
My hunch is that it will be ok. @yuyichao is the expert though The strtok seems a little special in terms of mutation… I wonder how many clib functions do that…
This issue seems to be an artifact of passing Julia string directly to C for the same memory space. This kind of thing has to be done cautiously to avoid incompatibility problems. Making a copy as byte array seems to be a logical choice.
Thanks @yuyichao, @StefanKarpinski, and @tk3369. I learned a lot and now I have a better understanding to continue building TextUserInterfaces.jl, which has the purpose to provide a “Julian” API to ncurses library
Can i use this Method below instead of the Vector of byte thing?
That’s what the documentary says.
Core.String — Method.
String(s::AbstractString)
Convert a string to a contiguous byte array representation encoded as UTF-8 bytes. This representation is often appropriate for passing strings to C.
Thank you for your active participation in this thread.
Well if you just want a way to show youself how not to write correct code or if you are not actually looking for advice on how to correctly interfacing external c code then go ahead with whatever you have and you will likely not see crazy behavior in test for a while.
If you want to write any usable code though, then the error you need to fix are not just for nuclear proof.
This is not that complicated. In order to safely pass a pointer to C, you need to make sure that memory won’t be reclaimed by Julia’s GC in the mean time. There are a few ways to ensure that the memory of a value is not reclaimed:
Have some local usage of the value after the ccall;
Explicitly tell Julia’s not to reclaim it by using GC.@preserve;
Have a global variable that references the value during the ccall;
Allow Julia to convert to pointer for you by using Cstring or Ref with autoconversion.
That’s about all there is to it. If, for example, you do p = pointer(obj) and there are no further usages of obj then it’s entirely possible that obj will be reclaimed before you can do anything with p and therefore p will be pointing to memory that is already being used for something else. If there’s a usage of obj (not a usage of p—that doesn’t help) after you pass p to ccall then you don’t have to do anything else, the usage will keep obj alive. If there is no usage of obj after ccall and you want to keep it alive, you can do something like this inside of your function:
GC.@preserve obj begin
p = pointer(obj)
# use p in ccall
end
That just tells the compiler that it’s not safe to reclaim the value obj during the block.
The only other issue that’s going on here is that Julia’s String type is immutable. Yes, it lives in memory somewhere so you could potentially modify that memory, but that is something that might break at any point in the future, so don’t rely on it. Therefore, if you want to pass some memory to ccall that it’s going to modify, you should not use String to do that. You can use a mutable Vector{UInt8} instead. The same rules apply to the Vector{UInt8} as anything else: you need to make sure there’s a usage of it after the ccall or you have to use GC.@preserve to keep it alive during the call. But if you’re relying on C to modify the memory, the presumably you’re going to look at that memory again, so it’s very likely that this is unnecessary since you will access or return the object after the ccall.
I know how the C function works internally. I’ve been working with c for 20 years or so. In C there is no immutable or mutable in such things with strings. I just didn’t know exactly how Julia works internally on such things.
Thing that you will not find in the manual are working code for every C function out there. You shouldn’t even expect to find all combinations of cases covered in the doc either (that’s really for tutorial and it’s basically impossible to cover all cases anyway).
What you should expect to find in the doc are basic rules that you need as well as pointers to how to find them. In this case, the few GC related things that you need to know are,
strtok returns and keeps your pointer: This is not the job of julia’s doc.
When that happens, you need to make sure the object doesn’t get free’d from julia: This is documented in the C interface page as clear as I think it could.
You can use GC.@preverve to preserve the object. This is documented. It’s not linked very clearly from the C interface page and that can certainly be improved.
So if you have read the doc, it should be very clear that you are missing something (i.e. 2 above is clear). Due to the missing pointer, it might not be as clear how you can fix it. In general though, this is exactly why a forum is needed. Even though GC.@preserve is one of the tool you can use, this is really a complex case for which the correct solution is really context dependent and that is definately not going to be covered (for all cases) in the doc.
In another word, when you have a real example that requires putting multiple pieces together, the forum is the right place to go since teaching the user how to solve complex problem isn’t what the manual is for.
Well, I have not checked but unless there’s some compiler special case I’m pretty sure if you pass a string literal as first argument to strtok in C you’ll also get a compiler warning and a runtime segfault. The condition that this happens or the effect when this happens is of course not the same but the point is that such property definately exist in C (const object) and the underlying deal is not that much different.
From a user point of view (my own ), I think we can improve the documentation about the Julia immutable types being changed by the C code. If I did not miss anything, then the docs are not clear.