How to correctly manage memory during ccall?

Hi,

I’m trying to wrap RCpp C++ code for use from Julia. The original code used R objects created from C++ to pass back arrays created in C++. I don’t understand what I have to do to correctly change the code for Julia without creating segfaults and the like.

I have modified the extern C function to return a pointer to a struct. This struct stores three array pointers and an Int, which is the length of all three arrays.

I can call the compiled library like this, and I do get correct looking values within C++, so I think I’m doing something correct at least. But calling the function multiple times segfaults, as does printing the return tuple, and other things I randomly tried:

result = ccall((:isobands_impl, "isobands.so"),
   Ptr{Tuple{Ptr{Float64},Ptr{Float64},Ptr{Int},Int}},
   (Ptr{Float64}, Int, Ptr{Float64}, Int, Ptr{Float64}, Int, Int, Float64, Float64),
   collect(1:4.0), 4,
   collect(1:5.0), 5,
   vec(rand(4, 5)), 5, 4,
   0.0, 1.0);

Then I can do this

julia> tup = unsafe_load(result)
(Ptr{Float64} @0x00007f974d3b3710, Ptr{Float64} @0x00007f974a8daa80, Ptr{Int64} @0x00007f974a890cc0, 14)

But this doesn’t work as the values are garbage. I’ve printed the values from within the C++ function and they should be different.

julia> unsafe_wrap(Vector{Float64}, tup[1], tup[4], own = false)
14-element Array{Float64,1}:
 6.9311400044737e-310
 6.9311400044737e-310
 6.9311382644069e-310
 6.9311382644085e-310
 6.93113660802377e-310
 6.93113664137636e-310
 0.0
 6.9311366080206e-310
 1.489573e-317
 4.0e-323
 6.93113660596687e-310
 0.0
 0.0
 5.0927898983e-313

Here I’ll paste the last part of the C++ function which returns the data, maybe someone can spot a glaring error there. I’m by no means an expert in any of this, so please be gentle :slight_smile:

// the struct I defined for my return type
struct resultStruct {
  double *x;
  double *y;
  int *id;
  int len;
};



// end of the function I'm calling
tuple<vector<double>, vector<double>, vector<int> > result = ib.collect();

  vector<double> res_x = get<0>(result);
  vector<double> res_y = get<1>(result);
  vector<int> res_id = get<2>(result);


  // I checked that these contain the correct data

  // cout << "xs: " << res_x.size() << endl;
  // for (auto i: res_x) {
  //   cout << i << endl;
  // }

  // cout << "ys: " << res_y.size() << endl;
  // for (auto i: res_y) {
  //   cout << i << endl;
  // }

  // cout << "id :" << res_id.size() << endl;
  // for (auto i: res_id) {
  //   cout << i << endl;
  // }

  int len = res_x.size();

  struct resultStruct* returnvalue;
  returnvalue->x = res_x.data();
  returnvalue->y = res_y.data();
  returnvalue->id = res_id.data();
  returnvalue->len = len;

  return returnvalue;

I’m not sure if this is the issue, but note that Int is not the same as int in C. Even on 64-bit systems, int is often 32-bit. If you want the same integer type as C then use the Cint alias which is defined to match whatever C’s int type is. It’s quite unlikely for double to be anything besides Float64 but Cdouble is also provided in the unlikely case that we get Julia running on a system where that’s not true.

Memory management in C++ is indeed a bit of a pain as compared to a language like Julia; and it gets even more complex when trying to merge the two together. :slight_smile:

First things first: let’s build up a good mental model for memory ownership. Either the memory will be owned by Julia, which means Julia will allocate the memory, pass it as a pointer to C++ code, then will later free it, or the memory will be owned by the C++ code, which means the C++ code will allocate it, pass it back to Julia, and later Julia will request it to be freed by passing it back to the C++ code to explicitly free it. As it stands, your code is doing neither of these things. :wink:

Let’s start at the end and work our way backwards. You have a resultStruct * that you’re returning. Now, a pointer is something that “points” to a memory address, so the first thing we should do is look to see where you are setting what the pointer points to. In this case, you’re not setting it at all, so when you do something like returnvalue->x = res_x.data(), the returnvalue->x means:

  • dereference returnvalue
  • go to the x slot offset within a resultStruct
  • set its value to the return value of res_x.data()

If you’ve never initialized the value of returnvalue, this means that you’re zooming off to a random memory address. This may or may not segfault. If it doesn’t segfault, it means that returnvalue is being randomly initialized with a memory address that is mapped into your program, but it can be overwritten by some other piece of code at any time.

You need to explicitly allocate memory for this object by calling either malloc() or new. Example:

struct resultStruct* returnvalue = (resultStruct*) malloc(sizeof(resultStruct));

This will allocate a chunk of memory and store the result into returnvalue; so that issue is gone. Next, let’s look at what we’re feeding into returnvalue: when you call res_x.data(), you’re getting a pointer to an internal buffer of a vector<double>, but when that vector<double> falls out of scope, its memory will become invalid; it is freed and can be overwritten at any time. You need to take ownership of those pieces as well.

The easiest way is to allocate memory for these pieces as well:

returnvalue->x = (double*) malloc(sizeof(double)*len);
returnvalue->y = (double*) malloc(sizeof(double)*len);
returnvalue->id = (int*) malloc(sizeof(int)*len);

Then copy the values over:

memcpy(returnvalue->x, res_x.data(), sizeof(double)*len);
memcpy(returnvalue->y, res_y.data(), sizeof(double)*len);
memcpy(returnvalue->id, res_id.data(), sizeof(int)*len);

Alright, now let’s switch over to the Julia side. We want to receive an object of type resultStruct *, which will be easiest if we have a resultStruct on the Julia side. I see your ccall() signature says Ptr{Tuple{Ptr{Float64},Ptr{Float64},Ptr{Int},Int}}, which I don’t think will work. (It might, but I’ve just never seen that before). I suggest mirroring the C struct in Julia-land with gross Ptr objects and whatnot, then having a Julia structure that can be constructed off of one of those C struct analogues:

# Exact same structure as on the C++ side; keep this in-sync with your C++ code.
struct CResultStruct
    x::Ptr{Cdouble}
    y::Ptr{Cdouble}
    id::Ptr{Cint}
    len::Cint
end

# Nice, Julia-side structure with Vectors and whatnot
struct JuliaResultStruct
    x::Vector{Cdouble}
    y::Vector{Cdouble}
    id::Vector{Cint}
end

# Function to take a pointer to a C `resultStruct` and turn it into a Julia `resultStruct`
function JuliaResultStruct(c_ptr::Ptr{CResultStruct})
    # This gives us a CResultStruct
    res = unsafe_load(c_ptr)
    return JuliaResultStruct(
        # Wrap around the C structure's `x` memory, turning it into a `Vector`
        unsafe_wrap(Vector{Cdouble}, res.x, (res.len,)),
        unsafe_wrap(Vector{Cdouble}, res.y, (res.len,)),
        unsafe_wrap(Vector{Cint}, res.id, (res.len,)),
    )
end

Then, your ccall() expects back a Ptr{CResultStruct}, and you just call JuliaResultStruct(result) to get back stuff that Julia can deal with.

Once you’re done with that chunk of data, you need to free it, so once you are certain that all Julia objects referring to the arrays are gone, you would pass the Ptr{CResultStruct} to another C++ function that calls free() on res->x, res->y, res->id and finally res itself.

10 Likes

Thanks a lot for that detailed answer. I was not aware that the struct syntax didn’t allocate memory correctly, I haven’t worked much with non-memory-managed languages.

One question that pops up immediately before I had the opportunity to try your suggestions:

Freeing the arrays in C++ after I’m done with them in Julia seems non-optimal, ideally I want to be done with C++ after receiving the arrays. I have seen that unsafe_wrap has an own keyword, which says it will call free for me at GC time. Can I not just use that to turn my allocated arrays into Julia arrays that can be garbage collected later?

Ok I only changed the return type from a pointer of the result struct to a value, so I save one thing to free.

Then I allocated and memcopied the arrays in the C++ code as you suggested. I also use unsafe_wrap with the own=true flag for the three arrays because so far I believe that this saves me a separate free call on the C++ side.

I do get the correct data now, I’ll report back should this solution still appear unstable later. Next thing will be to use this with BinaryBuilder…

Thanks again!

Ok I only changed the return type from a pointer of the result struct to a value, so I save one thing to free.

Yes, that’s perfectly fine, just be sure you don’t return references (e.g. anything that looks like resultStruct &) from your local functions, as they will be referring to memory that is no longer valid once the function ends.

I also use unsafe_wrap with the own=true flag for the three arrays because so far I believe that this saves me a separate free call on the C++ side.

Yes, this is also fine as long as the allocator used on the C++ side is the same as the deallocator being used on the Julia side. (This should be the case in your situation, but for others reading this who may be using a library that e.g. allocations things with jemalloc or some other allocator it may not be the case).