Problem with running Julia 1.6.7 with CUDA 11.7

I have a problem with running CUDA on Julia and can’t figure out what is going wrong. Allocating seems to work fine but any kind of operation creates large bug reports.

using CUDA
CUDA.functional() # returns true
N = 2^20
x = CUDA.fill(1.0f0, N)  # a vector filled with 1.0 (Float32) #Works fine
y = CUDA.fill(2.0f0, N)  # a vector filled with 2.0 ##Works fine

y .+= x  # Makes the REPL in VS code crash, gives large bug reports in Julia REPL.

This is what Julia version that is used:
Julia Version 1.6.7
Commit 3b76b25b64 (2022-07-19 15:11 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core™ i5-8400 CPU @ 2.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS =

This is the CUDA versioninfo:
CUDA toolkit 11.7, artifact installation
NVIDIA driver 516.94.0, for CUDA 11.7
CUDA driver 11.7

Libraries:

  • CUBLAS: 11.10.1
  • CURAND: 10.2.10
  • CUFFT: 10.7.1
  • CUSOLVER: 11.3.5
  • CUSPARSE: 11.7.3
  • CUPTI: 17.0.0
  • NVML: 11.0.0+516.94
  • CUDNN: 8.30.2 (for CUDA 11.5.0)
  • CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:

  • Julia: 1.6.7
  • LLVM: 11.0.1
  • PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
  • Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

1 device:
0: NVIDIA GeForce GTX 1070 (sm_61, 7.175 GiB / 8.000 GiB available)

And this is the error report after running the code above (it also gives a similar error message for each test when running pkg> test CUDA).

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ebb21f – ZN4llvm25remapInstructionsInBlocksERKNS_15SmallVectorImplIPNS_10BasicBlockEEERNS_8ValueMapIPKNS_5ValueENS_14WeakTrackingVHENS_14ValueMapConfigIS9_NS_3sys10SmartMutexILb0EEEEEEE at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\LLVM.dll (unknown line)
in expression starting at REPL[10]:1
ZN4llvm25remapInstructionsInBlocksERKNS_15SmallVectorImplIPNS_10BasicBlockEEERNS_8ValueMapIPKNS_5ValueENS_14WeakTrackingVHENS_14ValueMapConfigIS9_NS_3sys10SmartMutexILb0EEEEEEE at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\LLVM.dll (unknown line)
ZN4llvm17CloneFunctionIntoEPNS_8FunctionEPKS0_RNS_8ValueMapIPKNS_5ValueENS_14WeakTrackingVHENS_14ValueMapConfigIS7_NS_3sys10SmartMutexILb0EEEEEEEbRNS_15SmallVectorImplIPNS_10ReturnInstEEEPKcPNS_14ClonedCodeInfoEPNS_20ValueMapTypeRemapperEPNS_17ValueMaterializerE at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\LLVM.dll (unknown line)
LLVMCloneFunctionInto at C:\Users\Paldßn.julia\artifacts\39f327e25ea056497ed1e8b0d595b85576936986\bin\libLLVMExtra-11.dll (unknown line)
LLVMCloneFunctionInto at C:\Users\Paldán.julia\packages\LLVM\WjSQG\lib\libLLVM_extra.jl:323
#clone_into!#77 at C:\Users\Paldán.julia\packages\LLVM\WjSQG\src\utils.jl:35
clone_into!##kw at C:\Users\Paldán.julia\packages\LLVM\WjSQG\src\utils.jl:16 [inlined]
macro expansion at C:\Users\Paldán.julia\packages\GPUCompiler\iaKrd\src\irgen.jl:504 [inlined]
macro expansion at C:\Users\Paldán.julia\packages\LLVM\WjSQG\src\base.jl:102 [inlined]
macro expansion at C:\Users\Paldán.julia\packages\GPUCompiler\iaKrd\src\irgen.jl:476 [inlined]
macro expansion at C:\Users\Paldán.julia\packages\TimerOutputs\jgSVI\src\TimerOutput.jl:252 [inlined]
lower_byval at C:\Users\Paldán.julia\packages\GPUCompiler\iaKrd\src\irgen.jl:404
unknown function (ip: 000000000137cd18)
finish_module! at C:\Users\Paldán.julia\packages\GPUCompiler\iaKrd\src\ptx.jl:187
unknown function (ip: 0000000045fe19d8)
macro expansion at C:\Users\Paldán.julia\packages\GPUCompiler\iaKrd\src\driver.jl:260 [inlined]
#emit_llvm#104 at C:\Users\Paldán.julia\packages\GPUCompiler\iaKrd\src\utils.jl:64
unknown function (ip: 0000000045fc358a)
emit_llvm##kw at C:\Users\Paldán.julia\packages\GPUCompiler\iaKrd\src\utils.jl:62 [inlined]
cufunction_compile at C:\Users\Paldán.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:353
#222 at C:\Users\Paldán.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:347 [inlined]
JuliaContext at C:\Users\Paldán.julia\packages\GPUCompiler\iaKrd\src\driver.jl:74
unknown function (ip: 0000000045f7fc53)
cufunction_compile at C:\Users\Paldán.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:346
cached_compilation at C:\Users\Paldán.julia\packages\GPUCompiler\iaKrd\src\cache.jl:90
#cufunction#219 at C:\Users\Paldán.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:299
cufunction at C:\Users\Paldán.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:293 [inlined]
macro expansion at C:\Users\Paldán.julia\packages\CUDA\DfvRa\src\compiler\execution.jl:102 [inlined]
#launch_heuristic#246 at C:\Users\Paldán.julia\packages\CUDA\DfvRa\src\gpuarrays.jl:17 [inlined]
launch_heuristic##kw at C:\Users\Paldán.julia\packages\CUDA\DfvRa\src\gpuarrays.jl:17
_copyto! at C:\Users\Paldán.julia\packages\GPUArrays\gok9K\src\host\broadcast.jl:73 [inlined]
materialize! at C:\Users\Paldán.julia\packages\GPUArrays\gok9K\src\host\broadcast.jl:51 [inlined]
materialize! at .\broadcast.jl:891
unknown function (ip: 0000000045f75d14)
jl_clear_implicit_imports at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\libjulia-internal.dll (unknown line)
jl_clear_implicit_imports at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\libjulia-internal.dll (unknown line)
jl_clear_implicit_imports at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\libjulia-internal.dll (unknown line)
jl_interpret_toplevel_thunk at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\libjulia-internal.dll (unknown line)
jl_toplevel_eval_flex at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\libjulia-internal.dll (unknown line)
jl_toplevel_eval_flex at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\libjulia-internal.dll (unknown line)
jl_clear_implicit_imports at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\libjulia-internal.dll (unknown line)
jl_clear_implicit_imports at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\libjulia-internal.dll (unknown line)
jl_interpret_toplevel_thunk at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\libjulia-internal.dll (unknown line)
jl_toplevel_eval_flex at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\libjulia-internal.dll (unknown line)
jl_toplevel_eval_in at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\libjulia-internal.dll (unknown line)
unknown function (ip: 000000006b492667)
unknown function (ip: 000000006b4930d5)
unknown function (ip: 000000006b07c071)
unknown function (ip: 000000006b0b5072)
unknown function (ip: 000000006b0b569e)
unknown function (ip: 000000006aebb59a)
unknown function (ip: 000000006aebb641)
jl_f__call_latest at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\libjulia-internal.dll (unknown line)
unknown function (ip: 000000006b36f449)
unknown function (ip: 000000006b37dec9)
unknown function (ip: 000000006aea7ee1)
unknown function (ip: 000000006aea807e)
jl_call2 at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\libjulia-internal.dll (unknown line)
repl_entrypoint at C:\Users\Paldßn\myOwnPrograms\Julia-1.6.7\bin\libjulia-internal.dll (unknown line)
unknown function (ip: 0000000000401a63)
BaseThreadInitThunk at C:\WINDOWS\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
Allocations: 47087010 (Pool: 47072548; Big: 14462); GC: 51

I assume you’re using Julia builds from the home page?
Can you try upgrading Julia?

Yes, I have tried with the latest version as well, 1.8.1 but for some reason I cant add the CUDA package with that version. I get error for 4 dependencies, LLVM, GPUCompiler, GPUArrays and CUDA and when I try to precompile I get the following error:

ERROR: The following 1 direct dependency failed to precompile:

CUDA [052768ef-5323-5732-b1bb-66c8b64840ba]

Failed to precompile CUDA [052768ef-5323-5732-b1bb-66c8b64840ba] to C:\Users\Paldán.julia\compiled\v1.8\CUDA\jl_E37.tmp.
ERROR: LoadError: type Nothing has no field captures
Stacktrace:
[1] top-level scope
@ C:\Users\Paldán.julia\packages\LLVM\WjSQG\src\LLVM.jl:14
[2] top-level scope
@ stdin:1
in expression starting at C:\Users\Paldán.julia\packages\LLVM\WjSQG\src\LLVM.jl:1
in expression starting at stdin:1
ERROR: LoadError: Failed to precompile LLVM [929cbde3-209d-540e-8aea-75f648917ca0] to C:\Users\Paldán.julia\compiled\v1.8\LLVM\jl_1082.tmp.
Stacktrace:
[1] top-level scope
@ stdin:1
in expression starting at C:\Users\Paldán.julia\packages\GPUCompiler\07qaN\src\GPUCompiler.jl:1
in expression starting at stdin:1
ERROR: LoadError: Failed to precompile GPUCompiler [61eb1bfa-7361-4325-ad38-22787b887f55] to C:\Users\Paldán.julia\compiled\v1.8\GPUCompiler\jl_EEB.tmp.
Stacktrace:
[1] top-level scope
@ stdin:1
in expression starting at C:\Users\Paldán.julia\packages\CUDA\DfvRa\src\CUDA.jl:1
in expression starting at stdin:1

You should provide more details, e.g., which versions of packages you’re using. Nothing in the latest release of LLVM.jl uses regexes in the src/LLVM.jl file (part of your backtrace).

Could you be more specific? I only have the CUDA package installed.

LLVM.jl is a dependency of CUDA.jl. Try st -m in the package manager.