Calling c function strtok in julia

This is an awful lot of trouble to go to call the strtok function. Is that just a proxy for something more complex? If you literally just need strtok then it’s not very hard to write a Julia version of the function that handles Unicode correctly and works on any string type.

There is, however, no reason why passing an array of bytes to C will only handle ASCII strings. UTF-8 is also represented by arrays of bytes and as long as the separators are ASCII, it will work just fine. If you need to handle Unicode, on the other hand, you’re better off implementing this in Julia.

Regarding the correct use of ccall, you do not need a global to prevent deallocation. Yes, a global will not be deallocated. But a local variable that is used after the ccall will also not be deallocated. Or if you wrap a local in GC.@preserve then it also will not be deallocated. You’re now doing all three which is significant overkill.

But again, relying on C for simple sting splitting is probably not the way to go.

3 Likes

My hunch is that it will be ok. @yuyichao is the expert though :wink: The strtok seems a little special in terms of mutation… I wonder how many clib functions do that…

This issue seems to be an artifact of passing Julia string directly to C for the same memory space. This kind of thing has to be done cautiously to avoid incompatibility problems. Making a copy as byte array seems to be a logical choice.

1 Like

You should just not use string as input type. If you really want to use it, just let the user pass in a vector of byte instead.

That’s fine. It doesn’t spare you from any GC issues though.

1 Like

Thanks @yuyichao, @StefanKarpinski, and @tk3369. I learned a lot and now I have a better understanding to continue building TextUserInterfaces.jl, which has the purpose to provide a “Julian” API to ncurses library :slight_smile:

Can i use this Method below instead of the Vector of byte thing?
That’s what the documentary says.

Core.String — Method.

String(s::AbstractString)

Convert a string to a contiguous byte array representation encoded as UTF-8 bytes. This representation is often appropriate for passing strings to C.

Thank you for your active participation in this thread.

best regards

Michael

I’m trying to understand it better. Is there a GC issue only because the caller is in the global scope? What if we write it in a function?

function testme()
    p = strtok("- This, a sample string.", " ,.-")
    while(p ≠ "")
        println("Token : $p")
        p = strtok(nothing, " ,.-")
    end
end

No, this function gives me the same result. No problems with GC
I started this thread because I wanted to experiment something with ccall…

You cannot use a String period. So a String constructor does not help.

The GC issue is less important in global scope if anything.

As I said, the code is wrong but it will not crash all the time.

Hmm ok. I don’t want a nuclear power plant safe method. It’s only for testing…

Well if you just want a way to show youself how not to write correct code or if you are not actually looking for advice on how to correctly interfacing external c code then go ahead with whatever you have and you will likely not see crazy behavior in test for a while.

If you want to write any usable code though, then the error you need to fix are not just for nuclear proof.

1 Like

If the documentary would show safe suggestions on such topics then one would not need to ask either.

This is not that complicated. In order to safely pass a pointer to C, you need to make sure that memory won’t be reclaimed by Julia’s GC in the mean time. There are a few ways to ensure that the memory of a value is not reclaimed:

  1. Have some local usage of the value after the ccall;
  2. Explicitly tell Julia’s not to reclaim it by using GC.@preserve;
  3. Have a global variable that references the value during the ccall;
  4. Allow Julia to convert to pointer for you by using Cstring or Ref with autoconversion.

That’s about all there is to it. If, for example, you do p = pointer(obj) and there are no further usages of obj then it’s entirely possible that obj will be reclaimed before you can do anything with p and therefore p will be pointing to memory that is already being used for something else. If there’s a usage of obj (not a usage of p—that doesn’t help) after you pass p to ccall then you don’t have to do anything else, the usage will keep obj alive. If there is no usage of obj after ccall and you want to keep it alive, you can do something like this inside of your function:

GC.@preserve obj begin
    p = pointer(obj)
    # use p in ccall
end

That just tells the compiler that it’s not safe to reclaim the value obj during the block.

The only other issue that’s going on here is that Julia’s String type is immutable. Yes, it lives in memory somewhere so you could potentially modify that memory, but that is something that might break at any point in the future, so don’t rely on it. Therefore, if you want to pass some memory to ccall that it’s going to modify, you should not use String to do that. You can use a mutable Vector{UInt8} instead. The same rules apply to the Vector{UInt8} as anything else: you need to make sure there’s a usage of it after the ccall or you have to use GC.@preserve to keep it alive during the call. But if you’re relying on C to modify the memory, the presumably you’re going to look at that memory again, so it’s very likely that this is unnecessary since you will access or return the object after the ccall.

4 Likes

And to add to this. This function is a good complex C-interfacing example. It shows how to deal with special C-lifetime from julia code. So, if you

  1. Are looking for simple example, find something else.
  2. Want to use what strtok do in julia, write some julia code to do that instead.
  3. Want to learn more about complex case, then follow the advices.
3 Likes

I know how the C function works internally. I’ve been working with c for 20 years or so. In C there is no immutable or mutable in such things with strings. I just didn’t know exactly how Julia works internally on such things.

What do you think that IS missing from the doc?

Thing that you will not find in the manual are working code for every C function out there. You shouldn’t even expect to find all combinations of cases covered in the doc either (that’s really for tutorial and it’s basically impossible to cover all cases anyway).

What you should expect to find in the doc are basic rules that you need as well as pointers to how to find them. In this case, the few GC related things that you need to know are,

  1. strtok returns and keeps your pointer: This is not the job of julia’s doc.
  2. When that happens, you need to make sure the object doesn’t get free’d from julia: This is documented in the C interface page as clear as I think it could.
  3. You can use GC.@preverve to preserve the object. This is documented. It’s not linked very clearly from the C interface page and that can certainly be improved.

So if you have read the doc, it should be very clear that you are missing something (i.e. 2 above is clear). Due to the missing pointer, it might not be as clear how you can fix it. In general though, this is exactly why a forum is needed. Even though GC.@preserve is one of the tool you can use, this is really a complex case for which the correct solution is really context dependent and that is definately not going to be covered (for all cases) in the doc.

In another word, when you have a real example that requires putting multiple pieces together, the forum is the right place to go since teaching the user how to solve complex problem isn’t what the manual is for.

3 Likes

Well, I have not checked but unless there’s some compiler special case I’m pretty sure if you pass a string literal as first argument to strtok in C you’ll also get a compiler warning and a runtime segfault. The condition that this happens or the effect when this happens is of course not the same but the point is that such property definately exist in C (const object) and the underlying deal is not that much different.

Yes, you should avoid this with such functions. Also you should always make a copy of the string in C you want to process.

@StefanKarpinski and @yuyichao

Thanks for all the explanation.

From a user point of view (my own :slight_smile:), I think we can improve the documentation about the Julia immutable types being changed by the C code. If I did not miss anything, then the docs are not clear.

Maybe a small warning at the end of this section:

https://docs.julialang.org/en/v1/manual/calling-c-and-fortran-code/#man-bits-types-1

telling that the user must take care about these things.

The same can happen if I pass a pointer to an immutable structure to C and then C function modifies it right?