Does the function String() destroy its argument?

Hi! I am using a C function from an existing shared library to convert a Julian day number into a date and time information in the ISO 8601 format (YYYY-MM-DDThh:mm:ssZ). This C function generates and returns a vector of characters, which must be converted into a string with the String() function. The whole operation works fine except that this final conversion from a vector of characters to a string appears to empty that vector:

(base) michel@MicMac2:~$ JMtk15
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.2 (2022-09-29)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using JMtk15
[ Info: Precompiling JMtk15 [600a0e46-2e83-4a57-a945-993fe116ce31]

julia> jd = 2.4523965833333335e6
2.4523965833333335e6

julia> datetime = zeros(Cuchar, 28)
28-element Vector{UInt8}:
 0x00
 0x00
    ⋮
 0x00
 0x00

julia> status = MtkJulianToDateTime(jd, datetime)
MTK_SUCCESS::MTKt_status = 0x00000000

julia> dt = rstrip(String(datetime), '\0')
"2002-05-02T02:00:00Z"

julia> datetime
UInt8[]

julia> x = rstrip(String(datetime), '\0')
""

where the Julia call to the C function (by the same name) is written as follows:

function MtkJulianToDateTime(jd, datetime)
    ccall((:MtkJulianToDateTime, mtklib),
        MTKt_status,
        (Cdouble, Ptr{Cchar}),
        jd, datetime)
end

Two questions:

  1. Why does the variable datetime appear to be empty once it has been used to create dt?

  2. Is there any way that the Julia function MtkJulianToDateTime could set datetime to the desired string rather than a vector of characters? In other words, could this conversion be made part of the Julia function to avoid having to remember to do it explicitly after the function has returned control?

Thanks for any clarification on these matters.

From the String(v) documentation:

If v is Vector{UInt8} it will be truncated to zero length and
future modification of v cannot affect the contents of the resulting string.
To avoid truncation of Vector{UInt8} data, use String(copy(v))

The basic issue is that the String takes “ownership” of the array, so we have to be careful to make sure you can’t accidentally mutate the string later by editing the array.

One option for convering NUL-terminated data (from a C string) to a Julia String efficiently is to call

dt = GC.@preserve datetime unsafe_string(pointer(datetime))

(Low-level functions like this mainly show up when you are interacting with external C libraries.)

Another option for copy-free String-like “views” of arrays of bytes is the StringViews.jl package.

See also the discussion at: String constructor truncates data. · Issue #32528 · JuliaLang/julia · GitHub

4 Likes

Hi @stevengj,

Thanks for explaining the effect of String(v) when v is a Vector{UInt8}: this was as revealing as unexpected.

In practice, I do not need to keep access to the original vector of characters if I can count on the equivalent string to remain available and stable, but I needed to understand what was going on.

The second question is “How to write a Julia wrapper to that C function in such a way that it provides a Ptr{Cchar} as the second (input) argument but returns a Julia string as output?”

I thought of wrapping an ‘internal’ Julia function that deals with the ccall within an ‘external’ Julia function that provides a string interface to the outside world, so to speak:

function MtkJulianToDateTime(juldate, datetime)
    dt = zeros(Cuchar, 28)

    function mtkjuliantodatetime(juldate, dt)
        ccall((:MtkJulianToDateTime, mtklib),
            MTKt_status,
            (Cdouble, Ptr{Cchar}),
            juldate, dt)
    end

    datetime = GC.@preserve dt unsafe_string(pointer(dt))
end

but Julia strings like datetime are immutable, so that would not work either…

  • The next thought was to define datetime in this external Julia function as a mutable struct, though I’m not sure whether that would be wise…

Isn’t there a cleaner approach to write a Julia wrapper to that C function in such a way that it behaves like the original C function, and returns a Julia string rather than a vector of characters?

In any case, thanks again for your explanations.

I’m a bit confused on why you need nested functions here. Couldn’t you just do

function MtkJulianToDateTime(juldate, datetime)
    dt = zeros(Cuchar, 28)

    ccall((:MtkJulianToDateTime, mtklib),
        MTKt_status,
        (Cdouble, Ptr{Cchar}),
         juldate, dt)

    datetime = GC.@preserve dt unsafe_string(pointer(dt))
end

The above will actually work. However, whatever you are passing in as datetime is being ignored. It might as well be the following.

function MtkJulianToDateTime(juldate)
    dt = zeros(Cuchar, 28)

    ccall((:MtkJulianToDateTime, mtklib),
        MTKt_status,
        (Cdouble, Ptr{Cchar}),
         juldate, dt)

    return GC.@preserve dt unsafe_string(pointer(dt))
end

The following is completely fine, although the resulting string may have some extra null characters. The memory allocated by zeros will be tracked by the returned String.

function MtkJulianToDateTime(juldate)
    dt = zeros(Cuchar, 28)

    ccall((:MtkJulianToDateTime, mtklib),
        MTKt_status,
        (Cdouble, Ptr{Cchar}),
         juldate, dt)

    return String(dt)
end

If you don’t want the terminal nul bytes, you might want the following:

function MtkJulianToDateTime(juldate)
    dt = zeros(Cuchar, 28)

    ccall((:MtkJulianToDateTime, mtklib),
        MTKt_status,
        (Cdouble, Ptr{Cchar}),
         juldate, dt)

    nul_idx = findfirst(==(0), dt)

    return String(@view(dt[begin:nul_idx-1]))
end

Note that we have Cstring which is meant for this exact case:

help?> Cstring
search: Cstring Cwstring AbstractString escape_string unescape_string

  Cstring

  A C-style string composed of the native character type Cchars. Cstrings are
  NUL-terminated. For C-style strings composed of the native wide character
  type, see Cwstring. For more information about string interopability with C,
  see the manual.

Let’s say we want to the C standard library function strncat:

       char *strncat(char *dest, const char *src, size_t n);

We can do a loose wrapping of it in Julia as follows:

julia> function unsafe_c_strncat(dest, src, n)
           @ccall strncat(dest::Ptr{Cchar}, src::Cstring, n::Csize_t)::Cstring
       end
unsafe_c_strncat (generic function with 1 method)

One might be tempted to do the following but this will probably lead to a segmentation fault eventually since we have not actually allocated memory to include " dave".

julia> hello = "hello"; dave = " dave";

julia> unsafe_c_strncat(hello, dave, 5) |> unsafe_string
"hello dave"

This following kind of works, but may be surprising to you:

julia> hello = "hello\0\0\0\0\0"; dave = " dave";

julia> result = unsafe_c_strncat(hello, dave, 5)
Cstring(0x00007f63381bc6b8)

julia> result |> unsafe_string
"hello dave"

julia> hello
"hello dave"

# note that result and hello actually point the same memory
julia> pointer(result)
Ptr{Int8} @0x00007f63381bc6b8

julia> pointer(hello)
Ptr{UInt8} @0x00007f63381bc6b8

# The returned string from unsafe_string has its own allocated memory
julia> result |> unsafe_string |> pointer 
Ptr{UInt8} @0x00007f6337e98318

Here are safe versions of the call.

julia> function safe_c_strncat(
           a::AbstractString,
           b::AbstractString,
           n = ncodeunits(b)
       )
           v = zeros(UInt8, ncodeunits(a) + n + 1)
           iob = IOBuffer(v; write = true)
           write(iob, a)
           unsafe_c_strncat(v, b, n)
           return @view(v[1:end-1]) |> String
       end
safe_c_strncat (generic function with 2 methods)

julia> function safe_c_strncat2(
           a::AbstractString,
           b::AbstractString,
           n = ncodeunits(b)
       )
           v = Vector{UInt8}(a)
           resize!(v, ncodeunits(a) + n + 1)
           unsafe_c_strncat(v, b, n)
           return @view(v[1:end-1]) |> String
       end
safe_c_strncat2 (generic function with 2 methods)

julia> safe_c_strncat("hello", " dave", 2)
"hello d"

julia> safe_c_strncat2("hello", " dave", 4)
"hello dav"
1 Like

Hi @mkitti,

Thanks for your extensive input. There are indeed more straightforward ways to convert a Julian day number into a string date in ISO format, including coding the transformation directly in Julia, or using a date manipulation package.

My purpose is different and more constrained: NASA JPL has developed a Toolkit to manipulate the data files generated by the MISR instrument on the Terra platform (available from GitHub - nasa/MISR-Toolkit: an API facilitating the access of MISR standard product files), and a large number of users are accustomed to use those functions in their own programs. This Toolkit is written in C and JPL has already generated Python and IDL wrappers to facilitate access to these resources from those languages. There is no official plan to do the same for Julia, so I am exploring the feasibility of delivering such a wrapper myself.

In that context, it is critical that the user interface in Julia be as similar (preferably identical) to the interface in C or IDL, so that users can focus on their own goals rather than having to learn yet another API. Hence, I aim to design the Julia wrapper such that the function calls look as closely as possible to what they would be in those older languages. For instance, in IDL:

IDL> juliandate = 2.4523965833333335e6
IDL> status = MTK_JULIAN_TO_DATETIME(juliandate, datetime)
IDL> print, status
           0
IDL> print, datetime
2002-05-02T00:00:00Z
IDL> 

MtkJulianToDateTime is just one of 100+ functions available in the Toolkit, and most of those are much more involved computationally. The immediate goal is thus to provide the same interface in Julia, relying on the underlying C code to do the heavy lifting. The constraints are

  • the syntax should look the same as above (same arguments in the same order)
  • the return value must be the expected error return code that has a specific meaning (and is well documented) in the context of this Toolkit
  • the variable datetime must be a Julia string

I have updated the Julia wrapper function following your first suggestion, and also inserted a println statement to verify that the translation from a vector of chars to a string took place:

function MtkJulianToDateTime(juldate, datetime)
    dt = zeros(Cuchar, 28)
    status = ccall((:MtkJulianToDateTime, mtklib),
        MTKt_status,
        (Cdouble, Ptr{Cchar}),
        juldate, dt)
    datetime = GC.@preserve datetime unsafe_string(pointer(dt))
    println("Inside the function, datetime = ", datetime)
    return status
end

Yet, the outcome is an empty vector of characters and not the correct string:

(base) michel@MicMac2:~$ JMtk15
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.2 (2022-09-29)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using JMtk15
[ Info: Precompiling JMtk15 [600a0e46-2e83-4a57-a945-993fe116ce31]

julia> juldate = 2.4523965833333335e6
2.4523965833333335e6

julia> datetime = zeros(Cuchar, 28)
28-element Vector{UInt8}:
 0x00
 0x00
    ⋮
 0x00
 0x00

julia> status = MtkJulianToDateTime(juldate, datetime)
Inside the function, datetime = 2002-05-02T02:00:00Z
MTK_SUCCESS::MTKt_status = 0x00000000

julia> println("Outside the function, datetime = ", datetime)
Outside the function, datetime = UInt8[0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]

So it looks like the datetime inside the function is different from and unrelated to the datetime outside the function, probably because they are not of the same type, or in fact pointing to the same object. My earlier attempt at inserting a function within a function aimed at providing the expected vector of characters to ccall internally, but interfacing with the outside world in terms of regular strings.

Now, given that the translation did take place inside the wrapper function (as demonstrated by the println command), the remaining issue is to convey that string outside the function, in a manner as transparent as possible to the end user of the Julian Toolkit.

All suggestions are welcome.

That’s correct. You are trying to pass by reference, but Julia does not pass by reference. Arguments are passed by sharing. You would need to explicitly pass a reference.

function MtkJulianToDateTime(juldate, datetime::Ref{Union{Vector{Cuchar},String})
    status = ccall((:MtkJulianToDateTime, mtklib),
        MTKt_status,
        (Cdouble, Ptr{Cchar}),
        juldate, datetime[])
    datetime[] = GC.@preserve datetime unsafe_string(pointer(dt))
    println("Inside the function, datetime = ", datetime[])
    return status
end

You would then do the following to pass it.

julia> datetime = Ref{Union{Vector{Cuchar},String}}(zeros(Cuchar, 28))

julia> status = MtkJulianToDateTime(juldate, datetime)

julia> println("Outside the function, datetime = ", datetime[])

You might find my StaticStrings.jl package of interest.

julia> function unsafe_c_strncat(dest, src, n)
           @ccall strncat(dest::Ptr{Cchar}, src::Cstring, n::Csize_t)::Cstring
       end
unsafe_c_strncat (generic function with 1 method)

julia> using StaticStrings

julia> ra = Ref(cstatic"hello"11)
Base.RefValue{CStaticString{11}}("hello")

julia> unsafe_c_strncat(ra, " dave", 5);

julia> ra[]
cstatic"hello dave"11

julia> println(ra[])
hello dave

In your case, I think it might look like this.

function MtkJulianToDateTime(juldate, datetime)
    status = ccall((:MtkJulianToDateTime, mtklib),
        MTKt_status,
        (Cdouble, Ptr{Cchar}),
        juldate, datetime)
    println("Inside the function, datetime = ", datetime[])
    return status
end

using StaticStrings
juldate = 2.4523965833333335e6
datetime = Ref(cstatic""28);
MtkJulianToDateTime(juldate, datetime)
println("Outside the function, datetime = ", datetime[])
2 Likes

Thanks a lot, @mkitti: that solution works as you suggested:

(base) michel@MicMac2:~$ JMtk15
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.2 (2022-09-29)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using JMtk15

julia> using StaticStrings

julia> juldate = 2.4523965833333335e6
2.4523965833333335e6

julia> datetime = Ref(cstatic""28);

julia> MtkJulianToDateTime(juldate, datetime)
Inside the function, datetime = 2002-05-02T02:00:00Z
MTK_SUCCESS::MTKt_status = 0x00000000

julia> println("Outside the function, datetime = ", datetime[])
Outside the function, datetime = 2002-05-02T02:00:00Z

I now need to study your package in more detail, in particular to tame the syntax, which is unfamiliar to me… I really appreciate your help in this matter.

1 Like

The first strange syntax here is for Ref. A Ref{T} is a pointer to a single Julia variable of type T.

julia> r = Ref(5) # It can be inferred that we are creating some kind of Ref{Int64}
Base.RefValue{Int64}(5)

julia> typeof(r) # Note that the concrete type here is Base.RefValue{Int64}
Base.RefValue{Int64}

julia> r[] # This "dereferences" the reference. It's like putting a * in front of a pointer in C.
5

The second strange syntax introduced via StaticStrings are string macros.

Using cstatic""28 is like calling the macro @cstatic_str("", 28). It is similar to as if you have done the following.

julia> CStaticString{28}("")
cstatic""28

The one important difference is that the String "" is never actually created when using the string macro.

CStaticString itself is just a wrapper around a Tuple of bytes along with a terminal null byte. In your case, it creates a Tuple with 28 bytes. This tuple is analogous to a static byte array in C of length 28.

struct CStaticString{N} <: AbstractStaticString{N}
    data::NTuple{N,UInt8}
    _nul::UInt8
    ...
end

As you can see this struct is immutable. By wrapping it in a Ref, or concretely a Base.RefValue, we can mutate it.

julia> Ref(cstatic"Hello"28) |> pointer_from_objref |> Ptr{UInt8} |> unsafe_string
"Hello"

julia> ptr = r_str |> pointer_from_objref |> Ptr{UInt8}
Ptr{UInt8} @0x00007f78ac394250

julia> unsafe_store!(ptr, UInt8('h'))
Ptr{UInt8} @0x00007f78ac394250

julia> r_str |> pointer_from_objref |> Ptr{UInt8} |> unsafe_string
"hello"
1 Like

Great! Thanks a lot, @mkitti, for this additional explanation.