Formatting/converting Matrix{Char} into Vector{String}

Hello, I am writing some simple wrappers around an old C library.
I am new to Julia and absolutely new to C, so it might be something obvious.
I have managed to get majority of functions working, but I am unsure about the handling of returned Cchar matrices.
Here is an example:

##C-call:
#    int list_of_signals(int *, int *, char *, int *)
#    char list[24*N]
#    int ierr, id, ndim,  status
#    status = listas_(&id, &ndim, &num_of_sigs, list, &ierr)

Function is supposed to return first num_of_sigs signals in a list corresponding to a given id.

function list(id::Int, num_of_sigs::Int)

	ier=Ref{Cint}(0);
	ID=Ref{Cint}(id);
	NSIG=Ref{Cint}(num_of_sigs);
	list=Array{Cchar}(undef,24,n_dim);

	status = ccall(("list_of_signals", path_to_lib), Int32, 
		(Ref{Cint}, Ref{Cint},  Ptr{Cchar}, Ptr{Cint}),
		ID, NSIG, list, ier)
	return list, nsr[], status, ier[]
end

Function returns without an error and the output (after some manipulations) looks something like this:

permutedims(Char.(list))=
40Ă—24 Matrix{Char}:
 'S'  'I'  'G'  'N'  '1'  '\0'  '\0'  '\0'  …  '\0'  '\0'  '\0'  '\0'  '\0'  '\0'
 'S'  'I'  'G'  'N'  '1'  '0'   '\0'  '\0'     '\0'  '\0'  '\0'  '\0'  '\0'  '\0'
 'S'  'I'  'G'  'N'  '1'  '1'   '\0'  '\0'     '\0'  '\0'  '\0'  '\0'  '\0'  '\0'
...etc

And I want to convert this Matrix{Char} into Vector{String}
I managed to do it in a relatively ugly way:

    list_out=Vector{String}(undef,num_of_sigs)
	charlist=Char.(list);
	for i in 1:num_of_sigs
		list_out[i] = String(charlist[:,i]);
	end

This successfully converts it to a vector of string, however null termination is not eliminated:

String[
"SIGN1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
"SIGN10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
...

And I have an feeling that there should be a more elegant solution (either in the way the input is initially declared and passed to the ccall, or how it is converted/formatted later on)

PS: If there are any suggestions or notes on how the wrapper is written - they are very welcome! I am still a bit confused with Ref{T} and Ptr{T} and was just using Ref=inputs, Ptr=outputs as a rule of thumb, but I am unsure if this is the way to go.

Cheers and thanks for your help!

If the orientation of the strings are along the columns of the charlist, which it seems you are using, then you can do

1.7.2> join.(eachcol(charlist))
3-element Vector{String}:
 "SIGN1\0\0\0\0\0\0\0\0\0"
 "SIGN10\0\0\0\0\0\0\0\0"
 "SIGN11\0\0\0\0\0\0\0\0"

I’m not sure if there is some better way to chop off the null-characters, but at least you can use replace:

1.7.2> replace.(join.(eachcol(charlist)), '\0'=>"")
3-element Vector{String}:
 "SIGN1"
 "SIGN10"
 "SIGN11"
1 Like

Here’s a version which is quite a bit faster:

function nulljoin(itr)
    io = IOBuffer()
    for c in itr
        c == '\0' && break
        write(io, c)
    end
    return String(take!(io))
end

Then just call

nulljoin.(eachcol(charlist))
1 Like

Given this, you do not want to convert your 1-byte C char values to Char (4-byte Unicode values) at all. Not only is this inefficient, but it might be totally wrong if the C library is using UTF-8 encoded Unicode and not just ASCII.

Instead, the most efficient thing (assuming each column of list is a NUL-terminated C string) is probably:

strings = GC.@preserve list unsafe_string.(pointer.(eachcol(list)))

(and this is also correct for non-ASCII data if it is UTF-8 encoded).

e.g. using the same data as for your example above:

julia> list = Int8[83 83 83; 73 73 73; 71 71 71; 78 78 78; 49 49 49; 0 48 49; 0 0 0; 0 0 0; 0 0 0; 0 0 0; 0 0 0; 0 0 0; 0 0 0; 0 0 0; 0 0 0; 0 0 0; 0 0 0; 0 0 0; 0 0 0; 0 0 0; 0 0 0; 0 0 0; 0 0 0; 0 0 0];

julia> strings = GC.@preserve list unsafe_string.(pointer.(eachcol(list)))
3-element Vector{String}:
 "SIGN1"
 "SIGN10"
 "SIGN11"
6 Likes

What is the signature of the C-function? It might be possible to have ccall/cconvert take care of the string conversion, by using Cstring.

Thank you everyone for providing such a diverse array of answers (I was not expecting for this to happen so fast! Julia community is vigilant :wink: )

2 Likes

Signature of the C function is provided in the text that you have cited.
In particular function takes a char[24N]* where N is number of words expected in the output.
I was playing around with Cstring but it was always returning only the first signal name out of the list (my guess it was exiting on the null-termination character that fills the empty space after the first valid signal name. And because of this I had to turn to the method with a Matrix where each signal name will have it’s own line. From C-code standpoint there is not much difference what “shape” (a string of 24*N bytes or a matrix of 24columns and N rows) the variable will have as it is passed via the pointer and size matches.

This is exactly what I was thinking - there should be a better way and thank you very much for showing it! All the output code is ASCII, so there should not have been a problem, but thanks for highlighting that.

Could I ask to elaborate a bit on the GC.@preserve part? From what I could find it is something related to garbage collection, why it is used in this particular case? Shall it always be used with pointers?

Thanks!

Whenever you call pointer(x) on an object x, you need to make sure that x is not garbage-collected before you are done accessing the pointer. For ordinary objects, Julia tracks this automatically, but pointer(x) is just an integer address and the compiler loses the connection to x. So, you manually call GC.@preserve x ...calculations with pointer... to tell Julia to keep x around until the calculations are complete.

See also the GC.@preserve documentation.

6 Likes