Segmentation fault when embedding Julia code that uses Array

Hello,

I have an issue that I can’t figure out regarding returning memory allocated by Julia during embedding.

I tried to simplify the C-code but it’s still a bit verbose :

// Create an array
jl_array_t *jl_make_3d_array(void *existingArray, int *dimsArray)
{
  printf("%s \n", __func__);
  jl_value_t *array_type = jl_apply_array_type((jl_value_t *)jl_float64_type, 3);
  char strDimsBuf[12];
  snprintf(strDimsBuf, 12, "(%i, %i, %i)", dimsArray[0], dimsArray[1], dimsArray[2]);

  printf("%s \n", strDimsBuf);
  jl_value_t *dims = jl_eval_string(strDimsBuf);
  jl_array_t *xArray = jl_ptr_to_array(array_type, existingArray, dims, 0);
  return xArray;
}

// Call dummy function to test module import
static void external_module_dummy()
{
  printf("%s -- BEGIN \n", __FUNCTION__);
  {
    // Call easy function
    jl_module_t *custom_module = (jl_module_t *)jl_eval_string("custom_module");
    jl_function_t *dummy = jl_get_function(custom_module, "dummy");
    if (dummy != NULL)
    {
      printf("dummy is not null\n");
    }
    else
    {
      printf("dummy is null\n");
      return;
    }
    jl_call0(dummy);
  }
  printf("\n");
  printf("%s -- END \n\n", __FUNCTION__);
  return;
}

// Create a 3D Array and square it using custom_module
static void external_module_squareMeBaby_3D()
{
    printf("%s -- BEGIN \n", __func__);
    jl_module_t *custom_module = (jl_module_t *)jl_eval_string("custom_module");
    jl_function_t *func = jl_get_function(custom_module, "squareMeBaby!");

    if (func != NULL)
    {
      printf("squareMeBaby is not Null\n");
    }
    else
    {
      printf("squareMeBaby is null\n");
    }
    printf("%s -> make_array \n", __func__);
    double existingArray0[3][4][5]; 
    int length = 3*4*5;
    int dimsArray[3];
    dimsArray[0] = 3;
    dimsArray[1] = 4;
    dimsArray[2] = 5;
    jl_array_t* xArray = jl_make_3d_array(existingArray0, dimsArray);

    double *xData = (double *)jl_array_data(xArray);
    for (int i = 0; i < length; i++)
      xData[i] = i ;

    jl_value_t *ret = jl_call1(func, (jl_value_t *)xArray);
    // JL_GC_POP();
    printf("%s -> call done \n", __func__);
    printf("ret is %p \n\n", ret); // If memory is allocated in squareMeBaby this is a Null pointer

    if (!ret)
    {
      assert(false);
      return;
    }
    {
      // Is this necessary ?
      JL_GC_PUSH1(&ret);
      printf("len(ret)=%li \n", jl_array_len(ret));
      printf("rank %i = jl_array_rank(x) \n", jl_array_rank((jl_value_t *)ret));
      int d1 = jl_array_dim(ret, 0);
      int d2 = jl_array_dim(ret, 1);

      double *xResult = jl_array_data(ret);
      printf("xResult = [");
      for (int i = 0; i < d1; i++)
        for (int j = 0; j < d2; j++)
          printf("%lf ", xResult[i * d2 + j]);
      printf("]\n");
      // Is this necessary ?
      JL_GC_POP();
    }
    printf("%s -- END \n", __func__);
    return;
} 

int main(int argc, char *argv[])
{
  jl_init();
  jl_eval_string("include(\"test.jl\")");
  jl_eval_string("using .custom_module");
  external_module_dummy();
  external_module_squareMeBaby_3D();
  jl_atexit_hook(0);
}

The custom_module is a simple Julia file called test.jl on the same folder :

module custom_module
  using LinearAlgebra

  function dummy()
    println("Julia says... Hello, world ! Function dummy() from module custom_module has been executed !")
  end

  function squareMeBaby!(A)
    ## Square array and return the result
    #  A[:]=[i*i for i in A] # => This obviously works because it does not allocate memory
    B = A * A
    return B
  end

  export dummy
  export squareMeBaby!
end

Now in squareMeBaby! if I modify the data in-place wit hthe syntax A[:] = [i*i for i in A] it works without issue.
But If I try to return data out of place with the syntax B = A * A then the pointer returned is Null (abd will results in segfault). I do call JL_GC_PUSH(&ret) on it but it doesn’t change anything since the pointer itself is NULL.

I’m at loss as what I’m doing wrong here.

EDIT : I also tried this version :

  function squareMeBaby(A)
    ## Square array and return the result
    A[:]=[i*i for i in A]
    println(typeof(A))
  end

and I have a segfault I think println or tyopeof moved something in memory because the Rank and Length I read from the array looks like garbage (rank = 372, length = 140700242326323).

I don’t understand what’s going on…

In Julia if I do:

julia> using LinearAlgebra

julia> rand(3, 4, 5) * rand(3, 4, 5)
ERROR: MethodError: no method matching *(::Array{Float64,3}, ::Array{Float64,3})
Closest candidates are:
  *(::Any, ::Any, ::Any, ::Any...) at operators.jl:538
  *(::Number, ::AbstractArray) at arraymath.jl:52
  *(::AbstractArray, ::Number) at arraymath.jl:55
  ...
Stacktrace:
 [1] top-level scope at REPL[9]:1

So my guess is that you are not getting a response because A * A is undefined. You could try smaller like:

function squareMeBaby!(A)
    return rand(3,4,5)
end

Make sure you can get at those results. I suspect what you really want to do is:

function squareMeBaby!(A)
    return map(x->x*x, A)
end

Hi, thank you for your answer while you are right the * operator was misused (it was meant to be the broadcast operator .*) it sadly wasn’t the end of my troubles :slight_smile:

To expand a bit, I have a series of test for embedding Julia with a local API wrapping the functionnality needed often in a “nicer” way. The functions are all called on “Julia” 1D Array then on “Julia” 3D Array. From C perspective, the array are always flat pointer of contiguous memory.
The 1D Array always works.

I tried 2 things : using map(x -> x*x. A) and .* operator.

So here’s the thing :

  function squareMeBaby(A)
    ## Square array and return the result
    return map(x -> x*x, A)
  end

  function mutateMeByTen!(A)
    ## Multiple array in place by ten
    lmul!(10, A)
  end

=> This works.

  function squareMeBaby(A)
    ## Square array and return the result
    println(typeof(A))
    return map(x -> x*x, A)
  end

  function mutateMeByTen!(A)
    ## Multiple array in place by ten
    lmul!(10, A)
  end

=> This cause a gc segfault during mutateMeByTen. If I invert the order of call between mutateMeByTen! and squareMeBaby it’s fine - current test is square, then mutate if I call mutate then square it works.

  function squareMeBaby(A)
    ## Square array and return the result
    return A .* A
  end

  function mutateMeByTen!(A)
    ## Multiple array in place by ten
    lmul!(10, A)
  end

=> This cause a gc segfault during mutateMeByTen. If I invert the order of call between mutateMeByTen! and squareMeBaby it works.

  function squareMeBaby(A)
    ## Square array and return the result
    println(typeof(A))
    return A .* A
  end

  function mutateMeByTen!(A)
    ## Multiple array in place by ten
    lmul!(10, A)
  end

==> GC segfault during squareMeBaby

The complete error message seems to point at the GC :

in expression starting at none:0
gc_mark_loop at /buildworker/worker/package_linux64/build/src/gc.c:2161
_jl_gc_collect at /buildworker/worker/package_linux64/build/src/gc.c:2902
jl_gc_collect at /buildworker/worker/package_linux64/build/src/gc.c:3108
maybe_collect at /buildworker/worker/package_linux64/build/src/gc.c:827 [inlined]
jl_gc_pool_alloc at /buildworker/worker/package_linux64/build/src/gc.c:1142
jl_gc_alloc_ at /buildworker/worker/package_linux64/build/src/julia_internal.h:277 [inlined]
jl_gc_alloc at /buildworker/worker/package_linux64/build/src/gc.c:3150
_new_array_ at /buildworker/worker/package_linux64/build/src/array.c:94 [inlined]
_new_array at /buildworker/worker/package_linux64/build/src/array.c:162 [inlined]
jl_alloc_array_1d at /buildworker/worker/package_linux64/build/src/array.c:433
Array at ./boot.jl:406 [inlined]
Array at ./boot.jl:425 [inlined]
getindex at ./array.jl:413 [inlined]
adce_pass! at ./compiler/ssair/passes.jl:872
run_passes at ./compiler/ssair/driver.jl:146
optimize at ./compiler/optimize.jl:174
typeinf at ./compiler/typeinfer.jl:33
abstract_call_method_with_const_args at ./compiler/abstractinterpretation.jl:266
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:134
abstract_call_known at ./compiler/abstractinterpretation.jl:904
abstract_call at ./compiler/abstractinterpretation.jl:926
abstract_call at ./compiler/abstractinterpretation.jl:911
abstract_eval at ./compiler/abstractinterpretation.jl:1005
typeinf_local at ./compiler/abstractinterpretation.jl:1270
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1326
typeinf at ./compiler/typeinfer.jl:12
abstract_call_method_with_const_args at ./compiler/abstractinterpretation.jl:266
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:134
abstract_call_known at ./compiler/abstractinterpretation.jl:904
abstract_call at ./compiler/abstractinterpretation.jl:926
abstract_call at ./compiler/abstractinterpretation.jl:911
abstract_eval at ./compiler/abstractinterpretation.jl:1005
typeinf_local at ./compiler/abstractinterpretation.jl:1270
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1326
typeinf at ./compiler/typeinfer.jl:12
abstract_call_method_with_const_args at ./compiler/abstractinterpretation.jl:266
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:134
abstract_call_known at ./compiler/abstractinterpretation.jl:904
abstract_call at ./compiler/abstractinterpretation.jl:926
abstract_call at ./compiler/abstractinterpretation.jl:911
abstract_eval at ./compiler/abstractinterpretation.jl:1005
typeinf_local at ./compiler/abstractinterpretation.jl:1270
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1326
typeinf at ./compiler/typeinfer.jl:12
abstract_call_method_with_const_args at ./compiler/abstractinterpretation.jl:266
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:134
abstract_call_known at ./compiler/abstractinterpretation.jl:904
abstract_call at ./compiler/abstractinterpretation.jl:926
abstract_call at ./compiler/abstractinterpretation.jl:911
abstract_eval at ./compiler/abstractinterpretation.jl:1005
typeinf_local at ./compiler/abstractinterpretation.jl:1255
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1326
typeinf at ./compiler/typeinfer.jl:12
typeinf_edge at ./compiler/typeinfer.jl:484
abstract_call_method at ./compiler/abstractinterpretation.jl:419
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:111
abstract_call_known at ./compiler/abstractinterpretation.jl:904
abstract_call at ./compiler/abstractinterpretation.jl:926
abstract_call at ./compiler/abstractinterpretation.jl:911
abstract_eval at ./compiler/abstractinterpretation.jl:1005
typeinf_local at ./compiler/abstractinterpretation.jl:1270
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1326
typeinf at ./compiler/typeinfer.jl:12
typeinf_ext at ./compiler/typeinfer.jl:570
typeinf_ext at ./compiler/typeinfer.jl:601
jfptr_typeinf_ext_23344.clone_1 at /home/rcaillaud/Workspace/julia-1.5.3/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1690 [inlined]
jl_type_infer at /buildworker/worker/package_linux64/build/src/gf.c:296
jl_generate_fptr at /buildworker/worker/package_linux64/build/src/jitlayers.cpp:290
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:1964
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:1919 [inlined]
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2224 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1690 [inlined]
jl_call1 at /buildworker/worker/package_linux64/build/src/jlapi.c:226

So my understanding is that I have handle memory incorrectly at some point but I can’t figure out where / why.
Since the segfault ahppens inside the C implementation, gdb stacktrace isn’t helpful. The Valgrind output seem to confirm that a buffer is invalid somewhere :

signal (11): Segmentation fault
in expression starting at none:0
==26614== Warning: invalid file descriptor -1 in syscall close()
==26614== Warning: invalid file descriptor -1 in syscall close()
==26614== Syscall param write(buf) points to uninitialised byte(s)
==26614==    at 0x63DA839: syscall (in /lib64/libc-2.26.so)
==26614==    by 0x68C3483: mincore_validate (in /home/rcaillaud/Workspace/julia-1.5.3/lib/julia/libunwind.so.8.0.1)
==26614==    by 0x68C35D2: validate_mem (in /home/rcaillaud/Workspace/julia-1.5.3/lib/julia/libunwind.so.8.0.1)
==26614==    by 0x68C370D: access_mem (in /home/rcaillaud/Workspace/julia-1.5.3/lib/julia/libunwind.so.8.0.1)
==26614==    by 0x68C4115: dwarf_get (in /home/rcaillaud/Workspace/julia-1.5.3/lib/julia/libunwind.so.8.0.1)
==26614==    by 0x68C44E7: _ULx86_64_access_reg (in /home/rcaillaud/Workspace/julia-1.5.3/lib/julia/libunwind.so.8.0.1)
==26614==    by 0x68C2D80: _ULx86_64_get_reg (in /home/rcaillaud/Workspace/julia-1.5.3/lib/julia/libunwind.so.8.0.1)
==26614==    by 0x68CB11E: apply_reg_state (in /home/rcaillaud/Workspace/julia-1.5.3/lib/julia/libunwind.so.8.0.1)
==26614==    by 0x68CB97A: _ULx86_64_dwarf_step (in /home/rcaillaud/Workspace/julia-1.5.3/lib/julia/libunwind.so.8.0.1)
==26614==    by 0x68C4C04: _ULx86_64_step (in /home/rcaillaud/Workspace/julia-1.5.3/lib/julia/libunwind.so.8.0.1)
==26614==    by 0x547A886: jl_unw_step (stackwalk.c:538)
==26614==    by 0x547A886: jl_unw_stepn (stackwalk.c:99)
==26614==    by 0x547AAD3: rec_backtrace_ctx (stackwalk.c:188)
==26614==  Address 0xb6ac000 is on thread 1's stack

But I don’t know how to proceed from there; where to investigate.

I’m not an expert at this, but my guess is that any time to hang onto a pointer returned from a jl_* call, you should GC_PUSH!() it before the next jl_* call. Otherwise that jl_* call may cause garbage collection to run and that object will be freed.

Which to me means jl_apply_array_type, jl_eval_string, jl_ptr_to_array, jl_get_function, jl_make_3d_array, etc.

What you could try is calling jl_gc_enable(0) running your tests then calling jl_gc_enable(1) if that works, then it would point to an object being garbage collected on you.

Oh this is a good idea, I’ll try !
Does jl_get_function and jl_apply_array_type allocate memory ? I assumed only heap memory allocation would be marked by the GC ?

Normally jl_make_3d_array shouldn’t need the gc since it creates a jl_array_t with ownbuffer = 0 (allocating inside the C-part).

EDIT : jl_gc_enable(0) before => jl_gn_enable(1) after, stop the segfault.

The truthful answer is, I don’t know.

For jl_get_function maybe it just gets a pointer to the existing function. HOWEVER if you were to call jl_eval_string() and redefine that function, then I think that old function could be GCed on you. Granted if you never redefine the function then I guess you are safe.

For jl_apply_array_type my guess is it creates an instance of a “Type” object or maybe a “DataType” object. However once you pass it to jl_ptr_to_array you can let it be collected unless you want to reuse it.

I use jl_get_function(jlmodule_t* module, const char* name) and then jl_call, not eval for function. The only time I use jl_eval is for creating a tuple for the Array dimensions (but then the tuple is discarded after array creation) :confused:

That was my understanding as well. I don’t re-use apply_array_type but re-call it every time I create an array.

I just realized that immediately after calling jl_apply_array_type you call jl_ptr_to_array so it probably doesn’t need to be “rooted” because you are passing it to the next jl_* function…if you did other jl_& things before passing it, then it’s probably yes, but my guess is immediately turning around and using it would not need the “rooting”.

After creating the Julia array from the C buffer I perform a series of test on in calling jl_array_rank, jl_array_len, jl_array_dim and only then do I pass it as a parameter to jl_call.

The logic was that If the array was invalid I’d have caught it sooner ?

Thanks for the help anyway :slight_smile:

It’s too bad I can’t debug this, I feel like Julia could have a lot of potential as a scripting language but the C-API is confusing.

Luckily you don’t have to. A much easier approach is to make use of @cfunction to create function pointers to specific methods of your Julia functions that you can call directly from C. I.e., do as little as possible on the C side and let the Julia code do all the interfacing. You can find some examples of this in GitHub - GunnarFarneback/DynamicallyLoadedEmbedding.jl: Embed Julia with dynamical loading of libjulia at runtime., although there’s other stuff in there as well, that may or may not be useful to you.

1 Like

Thanks for the link, looks very promising. If possible, I’d still like to understand what’s going on if only by curiosity but passing through @cfunction may just be the missing glue needed.

I assume mapping type uses the same convention as ccall ?

Just a question, I see you used jl_get_global(jl_main_module, jl_symbol(name)); to obtain the function pointer, would that work with function inside module or is “global” misleading me ?

As far as I know GC protection is not only needed when calling functions that might allocate but anytime you call into the Julia runtime, so you can’t do a whole lot safely without it.

I assume mapping type uses the same convention as ccall ?

Yes.

Just a question, I see you used jl_get_global(jl_main_module, jl_symbol(name)); to obtain the function pointer, would that work with function inside module or is “global” misleading me ?

global refers to the top level of a module. If you haven’t made the @cfunction available in the Main module you need to look up another module first. It’s also possible to retrieve the function pointer from the return value of jl_eval_string, which may be more convenient.

I tried to use get_cfunction_pointer but I get segmentation fault when calling the function pointers.

I tried both method calling get_cfunction_pointer and using jl_eval_string(@cfunction, ...) but both results in segmentation faults (libjulia.so is linked through rpath

test.jl

function addMeBaby(x::Int, y::Int)::Int
  return x+y
end
const julia_addMeBabyInt = @cfunction(addMeBaby, Cint, (Cint, Cint,))

test.c

void callAddMeBabyInt() {
  int (*addMe)(int, int);
  addMe = get_cfunction_pointer("julia_addMeBabyInt");
  int res = addMe(3, 4);
  printf("%i \n", res);
}
void main() {
  jl_init();
  jl_eval_string("include(\"test.jl\")");
  jl_eval_string("using .custom_module");
  callAddMeBabyInt();
  jl_atexit_hook(0);
}

With the message :

fatal: error thrown and no exception handler available.
MethodError(f=Main.addMeBaby, args=(3, 4), world=0x0000000000006cab)

Unless you’re running a 32 bit Julia, Int and Cint are probably different types. Try removing the type annotations on the function, or change them to Cint.

You can debug parts of this from within Julia:

julia> function addMeBaby(x::Int, y::Int)::Int
         return x+y
       end
addMeBaby (generic function with 1 method)

julia> const julia_addMeBabyInt = @cfunction(addMeBaby, Cint, (Cint, Cint,))
Ptr{Nothing} @0x00007f8f8018cf10

julia> ccall(julia_addMeBabyInt, Cint, (Cint, Cint), 3, 4)
ERROR: MethodError: no method matching addMeBaby(::Int32, ::Int32)
Stacktrace:
 [1] top-level scope at ./REPL[3]:1

julia> function addMeBaby(x, y)
         return x+y
       end
addMeBaby (generic function with 2 methods)

julia> ccall(julia_addMeBabyInt, Cint, (Cint, Cint), 3, 4)
7
1 Like

Indeed, nice catch !

When working with array, I assume I have to either :

  • create a Julia array from C (so jl_value_t*) and declare Any in @cfunction call prototype
  • Pass the pointer to data and re-create the Array in Julia (but then I have to deal with row-major vs column major ?).

Same with generics, I have to declare for each variant I assume ?

You have to deal with row-major vs column-major either way, and it’s generally easier in the Julia code.

Yes, the @cfunction pointers have to be created for each method you want to call. If you want to interface very generically, the C API may be a better (but not so easy) route.

1 Like

So I made progress on the issue ! While @GunnarFarneback solution is acceptable, not having access to generic function was an issue for me.

Turns out that there was an earlier tests where I was calling :

void wrapped_jl_gc_push(void * arg) {
  jl_gc_push1(arg);
}
void wrapped_jl_gc_pop() {
  jl_gc_pop();
}

void testTuple() {
  jl_value_t* ret = jl_eval_string("'(1, 2, 3,)"); // Whatever in you need in your tuple
  wrapped_jl_gc_push(&ret);
  // ... some stuff happens
  wrapped_jl_gc_pop();
}

Adding inline to those function I no longer observe gc corruption.