Calling c function strtok in julia

Hello,

i just wrapped the C function strtok to Julia to practice. I wanted to try it myself only with the help of the documentation.
Can I leave this approach as it is or are there other possibilities? I didn’t know exactly how to handle the NULL better.

The Original example comes from http://www.cplusplus.com/reference/cstring/strtok/

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="- This, a sample string.";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," ,.-");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-");
  }
  return 0;
}

in Julia it looks like this…

function myStrTok(str::String, delim::String)
	t = ccall((:strtok, LIB), Cstring, (Cstring,Cstring), str, delim)
	return unsafe_string(t)
end

function myStrTok(delim::String)
	t = ccall((:strtok, LIB), Cstring, (Ptr{Nothing}, Cstring), C_NULL, delim)
	(t != C_NULL) ? (return unsafe_string(t)) : return ""
end

stok = "- This, a sample string."
p = myStrTok(stok, " ,.-")
while(p != "")
	global p
	println("Token : $p")
	p=myStrTok(" ,.-")
end

best regards

Michael

It seems fine. However, if you want to replicate that C behavior (with pointer to NULL), you can do:

function strtok(str::Union{Nothing,String}, delim::String)
    ptr     = str == nothing ? Cstring(C_NULL) : convert(Cstring, pointer(str))
    ptr_tok = ccall((:strtok, "libc"), Cstring, (Cstring, Cstring), ptr, delim)
    tok     = ptr_tok != C_NULL ? unsafe_string(ptr_tok) : ""
    return tok
end

stok = "- This, a sample string."
p    = strtok(stok, " ,.-")

while(p ≠ "")
	global p
	println("Token : $p")
	p = strtok(nothing, " ,.-")
end
julia> include("strok.jl")
Token : This
Token : a
Token : sample
Token : string
1 Like

Okay, thanks Ronis_BR :+1:

a nice approach to the solution.

best regards

Michael

  1. This is undefined behavior since String in julia cannot be mutated. You must use a Vector or something mutable instead.
  2. The function returns a pointer to the argument so you must GC.@preserve the first argument before you convert the returned pointer to string.
  3. This is very bad.

You need to GC.@preserve str at the least. Or the pointer you get from here can just be garbage.

3 Likes

Yes! Thanks to point this out, I was about to write to @mwolff that in this case the user must ensure that the str is not deallocated by the GC. It works in the example because the string is stored in a global variable, right?

hmm… such things are missed in the documentary and are left out for simplicity’s sake.

@yuyichao, just to improve my understanding, I know that you cannot mutate a string in Julia. Thus, if I pass a string as argument to a function, it will not be modified after the function is called. However, in this example, it is indeed modified:

stok = "- This, a sample string."
println(stok)
p    = strtok(stok, " ,.-")
println(stok)
julia> include("strok.jl")
- This, a sample string.
- This a sample string.

Can you please explain to me please?

It’ll also simply work if the compiler and GC doesn’t feel like messing with it. You are merely giving it a license to free it, which may happen at any time.

You can certainly write code to mutate string in Julia, it’s just that doing that is undefined behavior. You cannot expect Julia to act in any sane way after this point.

2 Likes

Oh I see. Thanks @yuyichao

If you have a little spare time, can you take a look in this modified version? This integration is something I am really interested in:

function strtok(str::Union{Nothing,Vector{String}}, delim::String)
    if str == nothing
        ptr = Cstring(C_NULL)
    else
        if length(str) != 1
            error("The vector `str` must have only one element.")
        end

        GC.@preserve str (ptr = convert(Cstring,pointer(str[1])))
    end

    ptr_tok = ccall((:strtok, "libc"), Cstring, (Cstring, Cstring), ptr, delim)
    tok     = ptr_tok != C_NULL ? unsafe_string(ptr_tok) : ""

    return tok
end

stok = ["- This, a sample string."]
p    = strtok(stok, " ,.-")

while(p ≠ "")
	global p
	println("Token : $p")
	p = strtok(nothing, " ,.-")
end

But in the dokumentary cstring(String) should be used for char* in c function. Vector only if i have to allocate memory myself then Ptr{Uint8}. I think strtok does this internally, because the original string is broken and gives new strings back. Sorry i must translate everything from german to english…

No, a vector of string doesn’t make any difference… You are still mutating the string.

1 Like

The block of the GC preserve need to enclose all the uses of the pointer as well AFAIU.

1 Like

But if I pass the pointer to the function instead of the string, then everything will be ok right? Like

strtok(pointer(stok), " ,.-")

No. Passing a pointer simply means the caller is now responsible to make sure the julia object doesn’t get free’d, there’s absolutely no way you can cheat on this. If you pass a pointer that come from a string then it’s wrong.

1 Like

Now I see. That would answer tons of questions I had with random segmentation faults while creating TextUserInterfaces.jl. Some ncurses functions needs a null-terminated string to create menus (char*). That string must not be deallocated since it is not copied anywhere, only the pointer is stored.

In this case, what can I do?

Since I need to use a mutable type, then I came up with this solution, in which I convert the string to a const UInt8 array. Of course, this will only work for ASCII characters:

const _vstr = UInt8[]

function strtok(str::Union{Nothing,String}, delim::String)
    GC.@preserve _vstr begin
        if str == nothing
            ptr = C_NULL
        else
            # Empty the current string vector.
            empty!(_vstr)

            # Create the new string vector.
            for c in str
                push!(_vstr, UInt8(c))
            end

            # We need a null-terminated string.
            push!(_vstr, '\0')

            # Get the pointer.
            ptr = convert(Ptr{Cvoid}, pointer(_vstr))
        end

        ptr_tok = ccall((:strtok, "libc"), Cstring, (Ptr{Cvoid}, Cstring), ptr, delim)
        tok     = ptr_tok != C_NULL ? unsafe_string(ptr_tok) : ""

        return tok
    end
end

p = strtok("- This, a sample string.", " ,.-")

while(p ≠ "")
	global p
	println("Token : $p")
	p = strtok(nothing, " ,.-")
end

In this case, is everything right? (I know it is ugly, but is it at least right :smiley:)

Why do you need _vstr as global?

It was the very first thing that came up to my mind. I cannot declare it local, on the function, because it will get deallocated right? AFAIK, strtok stores the pointer and uses it if called with NULL as input.

@yuyichao btw, I am wondering, if the C function does not change the string, only uses it, (ncurses does that) would then be wrong to pass the pointer to the string?