LLVM crash when running Flux and CuArray examples in julia 0.7

I’m running on Ubuntu 16.04 with a Quadro K1200. I’m able to build and run all of the NVIDIA CUDA examples, but I can’t seem to get Flux/CuArrays working. I’m using julia 0.7 which is not well supported by Flux and CuArrays yet, but it seems to be the only way to get an appropriate LLVM without having to build julia from source.

I’m attempting to follow the example code in the Flux documentation by creating a fully-connected neural layer on the GPU and propagating some input through it. Since I have to download the development version of Flux and CuArrays, I have to do something slightly evil in order to get the right CuArrays into Flux.

julia> using Flux, CuArrays
julia> Core.eval(Flux, :(using CuArrays; global gpu_adaptor = CuArrays.cu)) # evil
cu (generic function with 1 method)
julia> x = cu(rand(10))
10-element CuArray{Float32,1}:
 0.36436874
 0.38448074
 0.5700786
 0.7798407
 0.5332852
 0.25739464
 0.80841315
 0.5425161
 0.7336245
 0.9889414
julia> d = Dense(10, 10)
Dense(10, 10)
julia> g = gpu(d)
Dense(10, 10)

So that all works fine up until I get to here:

julia> g(x)
Illegal inttoptr
      %ptrint = ptrtoint %jl_value_t addrspace(10)* %1 to i64

signal (6): Aborted
in expression starting at no file:0
raise at /build/glibc-Cl5G7W/glibc-2.23/signal/../sysdeps/unix/sysv/linux/raise.c:54
abort at /build/glibc-Cl5G7W/glibc-2.23/stdlib/abort.c:89
runOnFunction at /buildworker/worker/package_linux64/build/src/llvm-gc-invariant-verifier.cpp:178
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/rkat/julia-0.7.0-rc2/bin/../lib/julia/libLLVM-6.0.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/rkat/julia-0.7.0-rc2/bin/../lib/julia/libLLVM-6.0.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/rkat/julia-0.7.0-rc2/bin/../lib/julia/libLLVM-6.0.so (unknown line)
LLVMRunPassManager at /home/rkat/julia-0.7.0-rc2/bin/../lib/julia/libLLVM-6.0.so (unknown line)
macro expansion at /home/rkat/.julia/dev/LLVM/src/base.jl:18 [inlined]
LLVMRunPassManager at /home/rkat/.julia/dev/LLVM/lib/6.0/libLLVM_h.jl:2689 [inlined]
run! at /home/rkat/.julia/dev/LLVM/src/passmanager.jl:34 [inlined]
optimize! at /home/rkat/.julia/packages/CUDAnative/TUT1/src/compiler.jl:606
#compile_function#66 at ./logging.jl:320
unknown function (ip: 0x7ff57c15a1ae)
compile_function at /home/rkat/.julia/packages/CUDAnative/TUT1/src/compiler.jl:625 [inlined]
#cufunction#67 at /home/rkat/.julia/packages/CUDAnative/TUT1/src/compiler.jl:702
unknown function (ip: 0x7ff57c152b96)
#cufunction at ./none:0
unknown function (ip: 0x7ff57c151a9d)
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
macro expansion at /home/rkat/.julia/packages/CUDAnative/TUT1/src/execution.jl:222 [inlined]
_cuda at /home/rkat/.julia/packages/CUDAnative/TUT1/src/execution.jl:180
unknown function (ip: 0x7ff57c197774)
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1812
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
macro expansion at ./gcutils.jl:87 [inlined]
_gpu_call at /home/rkat/.julia/dev/CuArrays/src/gpuarray_interface.jl:64
gpu_call at /home/rkat/.julia/dev/GPUArrays/src/abstract_gpu_interface.jl:151
unknown function (ip: 0x7ff57c196a23)
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1812
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
gpu_call at /home/rkat/.julia/dev/GPUArrays/src/abstract_gpu_interface.jl:128 [inlined]
copyto! at /home/rkat/.julia/dev/GPUArrays/src/broadcast.jl:13 [inlined]
copyto! at ./broadcast.jl:768
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1812
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
copy at ./broadcast.jl:744 [inlined]
materialize at ./broadcast.jl:724 [inlined]
broadcast at ./broadcast.jl:702
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1812
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
∇broadcast at /home/rkat/.julia/dev/Flux/src/tracker/array.jl:364
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1812
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
copy at /home/rkat/.julia/dev/Flux/src/tracker/array.jl:386 [inlined]
materialize at ./broadcast.jl:724 [inlined]
macro expansion at /home/rkat/.julia/packages/NNlib/7C9r/src/cubroadcast.jl:36 [inlined]
Dense at /home/rkat/.julia/dev/Flux/src/layers/basic.jl:80
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1812
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:324
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:428
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:363 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:686
jl_interpret_toplevel_thunk_callback at /buildworker/worker/package_linux64/build/src/interpreter.c:799
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x7ff5420a0e6f)
unknown function (ip: 0xffffffffffffffff)
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:808
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:831
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/builtins.c:633
eval at ./boot.jl:319
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:85
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:117 [inlined]
#28 at ./task.jl:262
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1538 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:268
unknown function (ip: 0xffffffffffffffff)
Allocations: 52478043 (Pool: 52470170; Big: 7873); GC: 147
Aborted (core dumped)

I am able to load this core file up in lldb, but it doesn’t do me much good:

$ lldb-6.0 /usr/local/bin/julia-0.7 -c core_rkat_21902_1000_6_julia-0.7
(lldb) target create "/usr/local/bin/julia-0.7" --core "core_rkat_21902_1000_6_julia-0.7"
Core file '/home/rkat/core_rkat_21902_1000_6_julia-0.7' (x86_64) was loaded.
(lldb) bt
* thread #1, name = 'julia-0.7', stop reason = signal SIGABRT
  * frame #0: 0x00007feb7f337269 libpthread.so.0`raise + 41
    frame #1: 0x00007feb7f9f21a7 libjulia.so.0.7`sigdie_handler at signals-unix.c:209
    frame #2: 0x00007feb7f9f21a0 libjulia.so.0.7`sigdie_handler(sig=6, info=<unavailable>, context=0x00007ffc2c93b2c0)
    frame #3: 0x00007feb7f337390 libpthread.so.0`___lldb_unnamed_symbol1$$libpthread.so.0 + 1
    frame #4: 0x00007feb7ef91428 libc.so.6`__GI_raise(sig=6) at raise.c:54
    frame #5: 0x00007feb7ef9302a libc.so.6`__GI_abort at abort.c:89
    frame #6: 0x00007feb7fa911fd libjulia.so.0.7`GCInvariantVerifier::runOnFunction(this=<unavailable>, F=<unavailable>) at llvm-gc-invariant-verifier.cpp:178
    frame #7: 0x00007feb7c9b13e1 libLLVM-6.0.so`llvm::FPPassManager::runOnFunction(llvm::Function&) + 737
    frame #8: 0x00007feb7c9b1421 libLLVM-6.0.so`llvm::FPPassManager::runOnModule(llvm::Module&) + 49
    frame #9: 0x00007feb7c9b0c34 libLLVM-6.0.so`llvm::legacy::PassManagerImpl::run(llvm::Module&) + 756
    frame #10: 0x00007feb7c932309 libLLVM-6.0.so`LLVMRunPassManager + 9

I don’t have any debug symbols. I’m also able to do

export JULIA_DEBUG=CUDAnative

before I run julia, in which case I get some output from the CUDA compiler:

julia> x = cu(rand(10))
┌ Debug: Initializing CUDA after call to cuMemAlloc
└ @ CUDAnative ~/.julia/packages/CUDAnative/TUT1/src/init.jl:31
10-element CuArray{Float32,1}:
 0.8173896
 0.011721498
 0.9850332
 0.08780074
 0.8223111
 0.40026447
 0.1748744
 0.90779287
 0.023328746
 0.16393353
julia> g = gpu(d)
Dense(10, 10)

julia> g(x)
┌ Debug: (Re)compiling function
│   ctx = CUDAnative.CompilerContext(CUDAnative.KernelWrapper{getfield(GPUArrays, Symbol("##17#18"))}(getfield(GPUArrays, Symbol("##17#18"))()), Tuple{CuArrays.CuKernelState,CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},typeof(float),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global},Tuple{Bool,Bool},Tuple{Int64,Int64}}}}}, v"5.0.0", true, nothing, nothing, nothing, nothing, nothing, getfield(GPUArrays, Symbol("##17#18"))())
└ @ CUDAnative ~/.julia/packages/CUDAnative/TUT1/src/compiler.jl:625
┌ Debug: Module entry point:
│   LLVM.name(entry) = "ptxcall_anonymous_1"
└ @ CUDAnative ~/.julia/packages/CUDAnative/TUT1/src/utils.jl:7
┌ Debug: Compiled CUDAnative.KernelWrapper{getfield(GPUArrays, Symbol("##17#18"))}(getfield(GPUArrays, Symbol("##17#18"))()) to PTX 5.0.0 for SM 5.0.0 using 25 registers.
│ Memory usage: 0 B local, 0 B shared, 0 B constant
└ @ CUDAnative ~/.julia/packages/CUDAnative/TUT1/src/compiler.jl:714
┌ Warning: `zeros(a::AbstractArray)` is deprecated, consider `zero(a)`, `fill(0, size(a))`, `fill!(copy(a), 0)`, or `fill!(similar(a), 0)`. Where necessary, use `fill!(similar(a), zero(eltype(a)))`.
│   caller = Type at array.jl:29 [inlined]
└ @ Core ~/.julia/dev/Flux/src/tracker/array.jl:29
┌ Debug: (Re)compiling function
│   ctx = CUDAnative.CompilerContext(CUDAnative.KernelWrapper{getfield(GPUArrays, Symbol("##17#18"))}(getfield(GPUArrays, Symbol("##17#18"))()), Tuple{CuArrays.CuKernelState,CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(float),Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}}, v"5.0.0", true, nothing, nothing, nothing, nothing, nothing, getfield(GPUArrays, Symbol("##17#18"))())
└ @ CUDAnative ~/.julia/packages/CUDAnative/TUT1/src/compiler.jl:625
┌ Debug: Module entry point:
│   LLVM.name(entry) = "ptxcall_anonymous_2"
└ @ CUDAnative ~/.julia/packages/CUDAnative/TUT1/src/utils.jl:7
┌ Debug: Compiled CUDAnative.KernelWrapper{getfield(GPUArrays, Symbol("##17#18"))}(getfield(GPUArrays, Symbol("##17#18"))()) to PTX 5.0.0 for SM 5.0.0 using 11 registers.
│ Memory usage: 0 B local, 0 B shared, 0 B constant
└ @ CUDAnative ~/.julia/packages/CUDAnative/TUT1/src/compiler.jl:714
┌ Warning: `zeros(a::AbstractArray)` is deprecated, consider `zero(a)`, `fill(0, size(a))`, `fill!(copy(a), 0)`, or `fill!(similar(a), 0)`. Where necessary, use `fill!(similar(a), zero(eltype(a)))`.
│   caller = Type at array.jl:29 [inlined]
└ @ Core ~/.julia/dev/Flux/src/tracker/array.jl:29
Dense(10, 10)
┌ Debug: (Re)compiling function
│   ctx = CUDAnative.CompilerContext(CUDAnative.KernelWrapper{getfield(GPUArrays, Symbol("##17#18"))}(getfield(GPUArrays, Symbol("##17#18"))()), Tuple{CuArrays.CuKernelState,CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float32,2},1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},getfield(Flux.Tracker, Symbol("##339#340")){Tuple{Bool,Bool}},Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}}, v"5.0.0", true, nothing, nothing, nothing, nothing, nothing, getfield(GPUArrays, Symbol("##17#18"))())
└ @ CUDAnative ~/.julia/packages/CUDAnative/TUT1/src/compiler.jl:625
┌ Debug: Module entry point:
│   LLVM.name(entry) = "ptxcall_anonymous_3"
└ @ CUDAnative ~/.julia/packages/CUDAnative/TUT1/src/utils.jl:7
┌ Debug: Compiled CUDAnative.KernelWrapper{getfield(GPUArrays, Symbol("##17#18"))}(getfield(GPUArrays, Symbol("##17#18"))()) to PTX 5.0.0 for SM 5.0.0 using 12 registers.
│ Memory usage: 0 B local, 0 B shared, 0 B constant
└ @ CUDAnative ~/.julia/packages/CUDAnative/TUT1/src/compiler.jl:714
┌ Debug: (Re)compiling function
│   ctx = CUDAnative.CompilerContext(CUDAnative.KernelWrapper{getfield(GPUArrays, Symbol("##17#18"))}(getfield(GPUArrays, Symbol("##17#18"))()), Tuple{CuArrays.CuKernelState,CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float32,2},1,CUDAnative.AS.Global},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},getfield(Base.Broadcast, Symbol("##1#2")){Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(+),Tuple{TrackedArray{…,CuArray{Float32,1}},TrackedArray{…,CuArray{Float32,1}}}}}},getfield(Base.Broadcast, Symbol("##7#8")){Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(+),Tuple{TrackedArray{…,CuArray{Float32,1}},TrackedArray{…,CuArray{Float32,1}}}},getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##3#4"))}}}},Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float32,2},1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float32,2},1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}}, v"5.0.0", true, nothing, nothing, nothing, nothing, nothing, getfield(GPUArrays, Symbol("##17#18"))())
└ @ CUDAnative ~/.julia/packages/CUDAnative/TUT1/src/compiler.jl:625
┌ Debug: Module entry point:
│   LLVM.name(entry) = "ptxcall_anonymous_4"
└ @ CUDAnative ~/.julia/packages/CUDAnative/TUT1/src/utils.jl:7
Illegal inttoptr
      %ptrint = ptrtoint %jl_value_t addrspace(10)* %1 to i64

signal (6): Aborted
in expression starting at no file:0
raise at /build/glibc-Cl5G7W/glibc-2.23/signal/../sysdeps/unix/sysv/linux/raise.c:54
abort at /build/glibc-Cl5G7W/glibc-2.23/stdlib/abort.c:89
runOnFunction at /buildworker/worker/package_linux64/build/src/llvm-gc-invariant-verifier.cpp:178
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/rkat/julia-0.7.0-rc2/bin/../lib/julia/libLLVM-6.0.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/rkat/julia-0.7.0-rc2/bin/../lib/julia/libLLVM-6.0.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/rkat/julia-0.7.0-rc2/bin/../lib/julia/libLLVM-6.0.so (unknown line)
LLVMRunPassManager at /home/rkat/julia-0.7.0-rc2/bin/../lib/julia/libLLVM-6.0.so (unknown line)
macro expansion at /home/rkat/.julia/dev/LLVM/src/base.jl:18 [inlined]
LLVMRunPassManager at /home/rkat/.julia/dev/LLVM/lib/6.0/libLLVM_h.jl:2689 [inlined]
run! at /home/rkat/.julia/dev/LLVM/src/passmanager.jl:34 [inlined]
optimize! at /home/rkat/.julia/packages/CUDAnative/TUT1/src/compiler.jl:606
#compile_function#66 at ./logging.jl:320
unknown function (ip: 0x7f34fc1550de)
compile_function at /home/rkat/.julia/packages/CUDAnative/TUT1/src/compiler.jl:625 [inlined]
#cufunction#67 at /home/rkat/.julia/packages/CUDAnative/TUT1/src/compiler.jl:702
unknown function (ip: 0x7f34fc14dac6)
#cufunction at ./none:0
unknown function (ip: 0x7f34fc14c9cd)
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
macro expansion at /home/rkat/.julia/packages/CUDAnative/TUT1/src/execution.jl:222 [inlined]
_cuda at /home/rkat/.julia/packages/CUDAnative/TUT1/src/execution.jl:180
unknown function (ip: 0x7f34fc1a4864)
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1812
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
macro expansion at ./gcutils.jl:87 [inlined]
_gpu_call at /home/rkat/.julia/dev/CuArrays/src/gpuarray_interface.jl:64
gpu_call at /home/rkat/.julia/dev/GPUArrays/src/abstract_gpu_interface.jl:151
unknown function (ip: 0x7f34fc1a3b13)
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1812
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
gpu_call at /home/rkat/.julia/dev/GPUArrays/src/abstract_gpu_interface.jl:128 [inlined]
copyto! at /home/rkat/.julia/dev/GPUArrays/src/broadcast.jl:13 [inlined]
copyto! at ./broadcast.jl:768
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1812
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
copy at ./broadcast.jl:744 [inlined]
materialize at ./broadcast.jl:724 [inlined]
broadcast at ./broadcast.jl:702
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1812
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
∇broadcast at /home/rkat/.julia/dev/Flux/src/tracker/array.jl:364
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1812
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
copy at /home/rkat/.julia/dev/Flux/src/tracker/array.jl:386 [inlined]
materialize at ./broadcast.jl:724 [inlined]
macro expansion at /home/rkat/.julia/packages/NNlib/7C9r/src/cubroadcast.jl:36 [inlined]
Dense at /home/rkat/.julia/dev/Flux/src/layers/basic.jl:80
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1812
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:324
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:428
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:363 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:686
jl_interpret_toplevel_thunk_callback at /buildworker/worker/package_linux64/build/src/interpreter.c:799
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x7f34c417869f)
unknown function (ip: 0xffffffffffffffff)
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:808
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:831
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/builtins.c:633
eval at ./boot.jl:319
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:85
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:117 [inlined]
#28 at ./task.jl:262
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2165
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1538 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:268
unknown function (ip: 0xffffffffffffffff)
Allocations: 52329647 (Pool: 52321701; Big: 7946); GC: 127
Aborted (core dumped)

I’m a bit lost. Where should I start?

Please make -C deps distclean-llvm and rebuild. Report here if it still happens after that.

I guess @rkat is using the official binaries?

I’m using julia 0.7 which is not well supported by Flux and CuArrays yet, but it seems to be the only way to get an appropriate LLVM without having to build julia from source.

Ah, I scanned for build from source but misinterpreted that sentence. @krat, could you try the latest nightly? It contains some additional LLVM patches that fix issues like this. Not entire sure the nightlies work with CUDAnative right now, due to removal of deprecations, but I’ll fix that ASAP.

I’m not sure I understand. I didn’t build julia from source. I’m using julia 0.7 because it doesn’t require me to do that.

As @simon suspected; could you test rc3 then as I mentioned? It’s live as of now.

sure, I’ll try now. apologies for the latency, my timezone is UTC+10.

the nightly release linked from the julia website is 1.0 which breaks everything that was previously deprecated. is there a way to get the 0.7 rc3 binaries? I found this travis CI build:

but there doesn’t seem to be a way to get the artefacts…

it’s really not urgent, maybe I should wait until the dust settles on the 0.7 release, which I’m assuming will be soon?

It’s here now: Julia v0.7.0-rc3 is here

2 Likes

I can still reproduce this on RC3.

~$ julia-0.7  
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-rc3.0 (2018-08-06 23:10 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu

julia> using CuArrays, Flux

julia> x = cu(rand(10)); d = Dense(10, 10)
┌ Warning: `a::AbstractArray - b::Number` is deprecated, use `a .- b` instead.
│   caller = glorot_uniform(::Int64, ::Vararg{Int64,N} where N) at utils.jl:4
└ @ Flux ~/.julia/dev/Flux/src/utils.jl:4
┌ Warning: `zeros(a::AbstractArray)` is deprecated, consider `zero(a)`, `fill(0, size(a))`, `fill!(copy(a), 0)`, or `fill!(similar(a), 0)`. Where necessary, use `fill!(similar(a), zero(eltype(a)))`.
│   caller = Type at array.jl:29 [inlined]
└ @ Core ~/.julia/dev/Flux/src/tracker/array.jl:29
┌ Warning: `zeros(a::AbstractArray)` is deprecated, consider `zero(a)`, `fill(0, size(a))`, `fill!(copy(a), 0)`, or `fill!(similar(a), 0)`. Where necessary, use `fill!(similar(a), zero(eltype(a)))`.
│   caller = Type at array.jl:29 [inlined]
└ @ Core ~/.julia/dev/Flux/src/tracker/array.jl:29
Dense(10, 10)

julia> Core.eval(Flux, :(using CuArrays; global gpu_adaptor = CuArrays.cu))
cu (generic function with 1 method)
julia> g = gpu(d)
┌ Warning: `zeros(a::AbstractArray)` is deprecated, consider `zero(a)`, `fill(0, size(a))`, `fill!(copy(a), 0)`, or `fill!(similar(a), 0)`. Where necessary, use `fill!(similar(a), zero(eltype(a)))`.
│   caller = Type at array.jl:29 [inlined]
└ @ Core ~/.julia/dev/Flux/src/tracker/array.jl:29
┌ Warning: `zeros(a::AbstractArray)` is deprecated, consider `zero(a)`, `fill(0, size(a))`, `fill!(copy(a), 0)`, or `fill!(similar(a), 0)`. Where necessary, use `fill!(similar(a), zero(eltype(a)))`.
│   caller = Type at array.jl:29 [inlined]
└ @ Core ~/.julia/dev/Flux/src/tracker/array.jl:29
Dense(10, 10)
julia> g(x)
... my remote shell doesn’t have enough history to show me all of this
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1829
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
∇broadcast at /home/rkat/.julia/dev/Flux/src/tracker/array.jl:364
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1829
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
copy at /home/rkat/.julia/dev/Flux/src/tracker/array.jl:386 [inlined]
materialize at ./broadcast.jl:724 [inlined]
macro expansion at /home/rkat/.julia/packages/NNlib/7C9r/src/cubroadcast.jl:36 [inlined]
Dense at /home/rkat/.julia/dev/Flux/src/layers/basic.jl:80
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1829
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:324
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:428 
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:363 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:686
jl_interpret_toplevel_thunk_callback at /buildworker/worker/package_linux64/build/src/interpreter.c:799 
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x7ff48621d72f)
unknown function (ip: 0xffffffffffffffff)
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:808
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:831
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/builtins.c:633
eval at ./boot.jl:319
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:85
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:117 [inlined]
#28 at ./task.jl:262
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2182
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1538 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:268
unknown function (ip: 0xffffffffffffffff)
Allocations: 52958782 (Pool: 52950386; Big: 8396); GC: 131
Aborted (core dumped)

any general debugging tips for what to do when encountering this kind of issue? should I just bite the bullet and build julia from source myself?

also, thanks for helping :slight_smile:

OK, I’ll have a closer look. Assuming this is the same illegal inttoptr assertion as before, @Keno are there still outstanding issues like this you know about?

Reported upstream: https://github.com/JuliaGPU/CUDAnative.jl/issues/223https://github.com/JuliaLang/julia/issues/28645

Having looked into this due to a related issue, it happens when we end up passing a tracked array pointer into a CUDA kernel, where some of our transformations break a GC invariant. At the root, though, this is caused by passing a invalid type to the GPU kernel, as can be seen when removing the abort() in the GC invariant verifier pass:

ERROR: GPU compilation failed, try inspecting generated code with any of the @device_code_... macros
CompilerError: could not compile #19(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float32,2},1,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},getfield(Base.Broadcast, Symbol("##1#2")){Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(+),Tuple{TrackedArray{…,CuArray{Float32,1}},TrackedArray{…,CuArray{Float32,1}}}}}},getfield(Base.Broadcast, Symbol("##7#8")){Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(+),Tuple{TrackedArray{…,CuArray{Float32,1}},TrackedArray{…,CuArray{Float32,1}}}},getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##3#4"))}}}},Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float32,2},1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float32,2},1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}); passing and using non-bitstype argument
- argument_type =Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},getfield(Base.Broadcast, Symbol("##1#2")){Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(+),Tuple{TrackedArray{…,CuArray{Float32,1}},TrackedArray{…,CuArray{Float32,1}}}}}},getfield(Base.Broadcast, Symbol("##7#8")){Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(+),Tuple{TrackedArray{…,CuArray{Float32,1}},TrackedArray{…,CuArray{Float32,1}}}},getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##3#4"))}}}},Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float32,2},1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float32,2},1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}
- argument = 4

Indeed, one of the arguments is a TrackedArray{..., CuArray}), which isn’t bits. cc @MikeInnes

Should be fixed.