Problems scaling jl_alloc_array_2d C API

I am observing a strange behavior with jl_alloc_array_2d C API when matrix size is > 300x300 float64. This is very strange because I am not seeing any issue creating a matrix and passing to a function for sizes lower than 200x200. The API hangs intermittently when matrix size is over 200x200 and always hangs when size is >300x300 on my laptop setup. This happens without any seg fault and I have to kill the process to recover.

I am curious to know why this could be happening and how to go about debugging it. My ultimate goal is to build a Go interface to Julia with basic functionality to pass scalars and matrices to Julia functions and retrieve the data back. I want this to be able to handle much larger matrix sizes such as 10kx10k. I am using Go’s cgo to call Julia’s C API and things are working great except for the issue I mentioned.

Any pointers to debug this? As a side note, I am unable to use GC push/pop since these seem to be macros and currently not accessible via Go’s cgo interface. I also tried using jl_ptr_to_array API but can’t seem to be able to pass dimentions pointer correctly yet.

code below worked fine, so not sure why cgo hangs with large array sizes

#include <julia.h>
#include <stdio.h>

// compile with:
// gcc -fPIC -DJULIA_INIT_DIR="/usr/local/julia/lib" -I/usr/local/julia/include/julia -I. -L/usr/local/julia/lib/julia  -L/usr/local/julia/lib -Wl,-rpath,/usr/local/julia/lib -ljulia test.c

int main(int argc, char *argv[])
{
    /* required: setup the Julia context */
    jl_init();

    /* create a 2D array */
    jl_value_t* array_type = jl_apply_array_type((jl_value_t*)jl_float64_type, 2);

    // size and allocate
    size_t n = 50000;
    jl_array_t *x = jl_alloc_array_2d(array_type, n, n);

    double *xData = (double*)jl_array_data(x);

    for (size_t i = 0; i < n*n; i++)
        xData[i] = i;

    jl_atexit_hook(0);

    return 0;
}

Did you try running your go program under a debugger like gdb and then interrupting it with Ctrl-C? Usually that will give you a stack trace of where the program is spending its time and might give some indication what is going on.

hi @paulmelis below is the stack trace from dlv debugger. i pressed ctrl-c when the code execution got stuck. thanks for your suggestion and would love to hear your thoughts on what is possibly happening here.

└─ $ ▶ dlv debug main.go 
Type 'help' for list of commands.
(dlv) c
received SIGINT, stopping process (will not forward signal)
> runtime.futex() /usr/local/go/src/runtime/sys_linux_amd64.s:580 (PC: 0x474c43)
Warning: debugging optimized function
   575:		MOVQ	ts+16(FP), R10
   576:		MOVQ	addr2+24(FP), R8
   577:		MOVL	val3+32(FP), R9
   578:		MOVL	$SYS_futex, AX
   579:		SYSCALL
=> 580:		MOVL	AX, ret+40(FP)
   581:		RET
   582:	
   583:	// int32 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
   584:	TEXT runtime·clone(SB),NOSPLIT,$0
   585:		MOVL	flags+0(FP), DI
(dlv) 

There’s very little context here without the stack trace being shown. Do you have that as well?

yup, here is stack trace.

(dlv) bt
0  0x0000000000474c43 in runtime.futex
   at /usr/local/go/src/runtime/sys_linux_amd64.s:580
1  0x00000000004395c6 in runtime.futexsleep
   at /usr/local/go/src/runtime/os_linux.go:44
2  0x000000000040f807 in runtime.notesleep
   at /usr/local/go/src/runtime/lock_futex.go:159
3  0x0000000000442959 in runtime.mPark
   at /usr/local/go/src/runtime/proc.go:1340
4  0x0000000000444072 in runtime.stopm
   at /usr/local/go/src/runtime/proc.go:2301
5  0x0000000000445655 in runtime.findrunnable
   at /usr/local/go/src/runtime/proc.go:2960
6  0x00000000004466f4 in runtime.schedule
   at /usr/local/go/src/runtime/proc.go:3169
7  0x0000000000446d58 in runtime.park_m
   at /usr/local/go/src/runtime/proc.go:3318
8  0x0000000000470fdb in runtime.mcall
   at /usr/local/go/src/runtime/asm_amd64.s:327

I guess showing the go side of what you’re doing is needed as well. Do you have a small self-contained example that shows the problem?

appreciate your help @paulmelis , thanks!

Here is a minimal code to highlight the issue. All it does is create a random 2D matrix by allocating memory in Julia env. and populating it in Go. The code, as shown, works fine on my system up to a matrix dimension of 50k x 50k, however, interestingly, if you were to remove the comment on the slice allocation in Go, it either hangs or crashes with seg fault. It’s a strange behavior and not sure if I am doing something wrong here with unsafe pointers…


import (
	"embedding/julia"
)

func main() {
	// initialize Julia env and defer cleanup
	julia.Init()
	defer julia.Exit()

	// matrix size
	n := 50000

	// uncomment to see code hang or lead to
	// a seg fault (reduce matrix size obviously when doing this)
	// _ = make([]float64, n*n)

	// allocate mem in Julia and populate with random
	// values in Go
	_ = julia.RandMat(n, n)
}
package julia

/*
// Start with the basic example from https://docs.julialang.org/en/release-0.6/manual/embedding/
//
// Obviously the paths below may need to be modified to match your julia install location and version number.
//
#cgo CFLAGS: -fPIC -DJULIA_INIT_DIR="/usr/local/julia/lib" -I/usr/local/julia/include/julia -I.
#cgo LDFLAGS: -L/usr/local/julia/lib/julia  -L/usr/local/julia/lib -Wl,-rpath,/usr/local/julia/lib -ljulia
#include <julia.h>
*/
import "C"

import (
	"math/rand"
	"unsafe"
)

func Init() {
	C.jl_init()
}

func Exit() {
	C.jl_atexit_hook(0)
}

// RandMat instantiates a float64 matrix in Julia env. and populates
// it in Go using random values
func RandMat(row, col int) *C.jl_value_t {
	arrayType := C.jl_apply_array_type(
		(*(C.jl_value_t))(
			unsafe.Pointer(
				C.jl_float64_type,
			),
		),
		C.ulong(uint64(2)),
	)

	array := C.jl_alloc_array_2d(
		arrayType,
		C.ulong(uint64(row)),
		C.ulong(uint64(col)),
	)

	data := array.data
	ptr := unsafe.Pointer(data)

	el := float64(0)
	for i := 0; i < row*col; i++ {
		p := (*float64)(unsafe.Pointer(uintptr(ptr) + uintptr(i)*unsafe.Sizeof(el)))
		*p = rand.Float64()
	}

	return (*(C.jl_value_t))(unsafe.Pointer(array))
}

Hmm, is there a build script needed as well to get this to work? I’m hardly familiar with go.

Assuming you have a recent version of Go installed (v1.16.5) and Julia installed at /usr/local/julia (v1.6.1), pl. create following folder tree structure, say in your home folder:

$ tree julia
julia
├── cmd
│   └── main.go
├── go.mod
└── julia.go

main.go contains the code from the first section in my previous comment, the one starting with package main

julia.go contains code from the second second section in my previous comment, the one starting with package julia

and go.mod is below:

$ cat go.mod 
module embedding/julia

go 1.16

At this point you can go to cmd folder and run:

$ go build main.go
$ ./main

Hope that helps. I think you will also need gcc build tools since cgo is involved.

Well, I got it to compile and run, but I’m not seeing the hangs or segfaults you’re seeing (this is with n=10000 and both _ = ... lines in main.go uncommented). Anything specific I should try?

that’s great to hear. Could you pl. try following. This wraps memory created in Go instead of Julia allocating it:

Add this to julia.go

func WrapMat(x []float64, row, col int) *C.jl_value_t {
	arrayType := C.jl_apply_array_type(
		(*(C.jl_value_t))(
			unsafe.Pointer(
				C.jl_float64_type,
			),
		),
		C.ulong(uint64(2)),
	)

	data := make([]C.double, row*col)
	for i := 0; i < row*col; i++ {
		data[i] = C.double(x[i])
	}
	dataPtr := unsafe.Pointer(&data[0])

	dims := make([]C.ulong, 2)
	dims[0] = C.ulong(uint64(row))
	dims[1] = C.ulong(uint64(col))
	dimPtr := (*(C.jl_value_t))(unsafe.Pointer(&dims[0]))

	array := C.jl_ptr_to_array(arrayType, dataPtr, dimPtr, C.int(0))

	return (*(C.jl_value_t))(unsafe.Pointer(array))
}

And call it in main.go

n := 500
x := make([]float64, n*n)
_ = julia.WrapMat(x, n, n)

This is consistently failing for me when n > 300 with same symptoms of either hangs or seg faults. However, for lower matrix sizes, it is working fine.

BTW, I am on Fedora Linux 34, which OS did you try this on?

$ uname -r
5.12.10-300.fc34.x86_64

$ gcc --version
gcc (GCC) 11.1.1 20210531 (Red Hat 11.1.1-3)

Okay, I see the hang you had as well. Looking at the gdb stacktrace there’s a non-main thread that runs the Julia GC as a result of the jl_ptr_to_array call from Go:

(gdb) bt
#0  runtime.futex () at /usr/lib/go/src/runtime/sys_linux_amd64.s:580
#1  0x000000000042c897 in runtime.futexsleep (addr=0x4dfa38 <runtime.sched+280>, val=0, ns=60000000000) at /usr/lib/go/src/runtime/os_linux.go:50
#2  0x000000000040c419 in runtime.notetsleep_internal (n=0x4dfa38 <runtime.sched+280>, ns=60000000000, ~r2=<optimized out>) at /usr/lib/go/src/runtime/lock_futex.go:201
#3  0x000000000040c4f1 in runtime.notetsleep (n=0x4dfa38 <runtime.sched+280>, ns=60000000000, ~r2=<optimized out>) at /usr/lib/go/src/runtime/lock_futex.go:224
#4  0x000000000043e465 in runtime.sysmon () at /usr/lib/go/src/runtime/proc.go:5203
#5  0x0000000000435608 in runtime.mstart1 () at /usr/lib/go/src/runtime/proc.go:1306
#6  0x000000000043550e in runtime.mstart () at /usr/lib/go/src/runtime/proc.go:1272
#7  0x0000000000466851 in crosscall_amd64 () at gcc_amd64.S:35
#8  0x00007fffc8d49640 in ?? ()
#9  0x0000000000000000 in ?? ()

(gdb) info threads
  Id   Target Id                                Frame 
  1    Thread 0x7ffff7d56b80 (LWP 25159) "main" runtime.futex () at /usr/lib/go/src/runtime/sys_linux_amd64.s:580
* 2    Thread 0x7fffc8d49640 (LWP 25163) "main" runtime.futex () at /usr/lib/go/src/runtime/sys_linux_amd64.s:580
  3    Thread 0x7fffc8548640 (LWP 25164) "main" 0x00007ffff78d4d73 in jl_gc_collect () from /usr/lib/julia/libjulia-internal.so.1
  4    Thread 0x7fffc7d47640 (LWP 25165) "main" runtime.futex () at /usr/lib/go/src/runtime/sys_linux_amd64.s:580
  5    Thread 0x7fffc7506640 (LWP 25166) "main" runtime.futex () at /usr/lib/go/src/runtime/sys_linux_amd64.s:580
  6    Thread 0x7fffc6d05640 (LWP 25167) "main" runtime.futex () at /usr/lib/go/src/runtime/sys_linux_amd64.s:580
  7    Thread 0x7fffc621c640 (LWP 25168) "main" 0x00007ffff7d9cae2 in sigtimedwait () from /usr/lib/libc.so.6

(gdb) thread 3
[Switching to thread 3 (Thread 0x7fffc8548640 (LWP 25164))]
#0  0x00007ffff78d4d73 in jl_gc_collect () from /usr/lib/julia/libjulia-internal.so.1
(gdb) bt
#0  0x00007ffff78d4d73 in jl_gc_collect () from /usr/lib/julia/libjulia-internal.so.1
#1  0x00007ffff78d508c in jl_gc_pool_alloc () from /usr/lib/julia/libjulia-internal.so.1
#2  0x00007ffff78d5725 in jl_gc_alloc () from /usr/lib/julia/libjulia-internal.so.1
#3  0x00007ffff78a3d6a in jl_ptr_to_array () from /usr/lib/julia/libjulia-internal.so.1
#4  0x0000000000465e22 in _cgo_169e35b66d06_Cfunc_jl_ptr_to_array (v=0xc000042eb0) at /tmp/go-build/cgo-gcc-prolog:120
#5  0x000000000045e570 in runtime.asmcgocall () at /usr/lib/go/src/runtime/asm_amd64.s:667
#6  0x000000c000000180 in ?? ()
#7  0x000000c0004d6000 in ?? ()
#8  0x000000c0000200e0 in ?? ()
#9  0x00000000004dd801 in runtime.class_to_divmagic ()
#10 0x000000c000000d80 in ?? ()
#11 0x00000000000001c0 in ?? ()
#12 0x000000c000000180 in ?? ()
#13 0x000000c000000180 in ?? ()
#14 0x0000000000000000 in ?? ()

Although the Julia embedding docs don’t seem to mention it explicitly there’s some post here on the forum relating to thread-safety when embedding. Looking at Embedding Julia into multithreading apps - #10 by yuyichao it seems most API methods are thread-safe, but not when calling them from threads that were not started by Julia. In your case Go will have started the different threads, including the one that calls Julia API methods, so I suspect that is what is going on here. But perhaps @yuyichao has some more insights.

wow, that’s pretty cool finding and thanks for the link to the great discussion on thread safety. My use case does not require spawning multiple threads, so looking forward to any workarounds if possible.

I ran with GOMAXPROCS=1, which should limit execution of Go binary to 1 thread, but the code still hung up with large arrays. Any thoughts on how to go about verifying if this is thread safety issue or something else?

hi @yuyichao , wondering if you have recommendations on best way to consume C API? thanks.

When you run it in a GDB, it should by default tell you if any threads are started. Also, if you wrote all the call to julia functions, you can just print the thread id on every call to julia functions.

If you are unable to use GC push/pop crash should be expected. I don’t know why it would hang although if there’s memory overwriten for locks it could certainly happen. If you can’t use C macro you’ll need to just write the equivalent manually.

@yuyichao thanks for your reply. I was hoping to manage memory externally and skip GC, however, that approach also hangs or seg faults with large arrays. Let me look into it more with your suggestions and see if these issues go away.

jl_alloc_array_2d, and all the runtime are using the GC so you are by no mean skipping it.

I see, ok, thanks for clarification.