PackageCompiler fails to create app for MadNLPGPU + ExaModels (CUDSS linear solver)

I can not create “app” with PackageCompiler even for simplest GPU optimization with ExaModels+MadNLPGPU running cuDSS. I’ve made full MWE project with package to be built in attempt to help this to debug. Compiling does not pass with either LTS or release. It does not pass with compile=all command. Different error message appears if run to compile with --compile=all --strip-metadata --strip-ir, filter_stdlibs=true (maybe more meaningful than the one below; just swap two commands in setup.jl). Without compile=all, strip and filter commands, the compile passes but fails at runtime (calls multiple instances of julia.exe until it consumes all free RAM).

I attach MWE with toml’s for Julia 1.10.8 and 1.11.3. To run just set root dir as work dir and run setup.jl . I think this would be a big deal if this is solved. We need some kind of a test for this IMO.

mwe.7z on mega.nz (Uploading compressed folders on discourse is not allowed so I uploaded at free host.)

I am tagging some people that could be interested: @maleadt , @sshin23, @frapac, @amontoison, @odow, @sdanisch. Please ignore if you feel inappropriately tagged. I tagged mostly leaders of relevant packages. (Edit(odow): I’ve removed the explicit tags.)

Error log using Julia 1.10.8 and compile=all.

⠸ [05m:46s] PackageCompiler: compiling nonincremental system imageimmarg operand has non-immediate parameter
⢰ [05m:46s] PackageCompiler: compiling nonincremental system image %value_phi = phi i8 [ %guard_res8, %guard_exit6 ], [ %guard_res15, %guard_exit13 ]
immarg operand has non-immediate parameter
%value_phi = phi i32 [ %guard_res5, %guard_exit4 ], [ %guard_res12, %guard_exit10 ]
call void @llvm.nvvm.cp.async.wait.group(i32 %value_phi) [ “jl_roots”({} addrs⣠ [05m:46s] PackageCompiler: compiling nonincremental system image%28 = call <2 x i64> @llvm.x86.aesni.aeskeygenassist(<2 x i64> %unbox, i8 zeroext %value_phi) [ “jl_roots”({} addrspace(10)* %44) ], !dbg !125496
⣄ [05m:47s] PackageCompiler: compiling nonincremental system image%16) ], !dbg !187547
⠙ [05m:47s] PackageCompiler: compiling nonincremental system imageLLVM ERROR: Broken module found, compilation aborted!

[11476] signal (6.-6): Aborted
in expression starting at none:0
Allocations: 1520472196 (Pool: 1519649926; Big: 822270); GC: 645
:heavy_multiplication_x: [05m:48s] PackageCompiler: compiling nonincremental system image
ERROR: LoadError: failed process: Process(/home/karlo/.julia/juliaup/julia-1.10.8+0.x64.linux.gnu/bin/julia --color=yes --startup-file=no --pkgimages=no '--cpu-target=generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)' --compile=all --sysimage=/tmp/jl_eAnXJI/sys.so --project=/home/karlo/Desktop/mwe/gpu_opt.jl --output-o=/tmp/jl_EbIb0NGBid-o.a /tmp/jl_EN2JafOnKl, ProcessSignaled(6)) [0]

Stacktrace:
[1] pipeline_error
@ ./process.jl:565 [inlined]
[2] run(::Cmd; wait::Bool)
@ Base ./process.jl:480
[3] run
@ ./process.jl:477 [inlined]
[4] #20
@ ~/.julia/packages/PackageCompiler/UbaS4/ext/TerminalSpinners.jl:157 [inlined]
[5] spin(f::PackageCompiler.var"#20#22"{Cmd}, s::PackageCompiler.TerminalSpinners.Spinner{Base.TTY})
@ PackageCompiler.TerminalSpinners ~/.julia/packages/PackageCompiler/UbaS4/ext/TerminalSpinners.jl:164
[6] macro expansion
@ ~/.julia/packages/PackageCompiler/UbaS4/ext/TerminalSpinners.jl:157 [inlined]
[7] create_sysimg_object_file(object_file::String, packages::Vector{String}, packages_sysimg::Set{Base.PkgId}; project::String, base_sysimage::String, precompile_execution_file::Vector{String}, precompile_statements_file::Vector{String}, cpu_target::String, script::Nothing, sysimage_build_args::Cmd, extra_precompiles::String, incremental::Bool, import_into_main::Bool)
@ PackageCompiler ~/.julia/packages/PackageCompiler/UbaS4/src/PackageCompiler.jl:130
[8] create_sysimg_object_file
@ ~/.julia/packages/PackageCompiler/UbaS4/src/PackageCompiler.jl:319 [inlined]
[9] create_sysimage(packages::Vector{String}; sysimage_path::String, project::String, precompile_execution_file::String, precompile_statements_file::Vector{String}, incremental::Bool, filter_stdlibs::Bool, cpu_target::String, script::Nothing, sysimage_build_args::Cmd, include_transitive_dependencies::Bool, base_sysimage::Nothing, julia_init_c_file::Nothing, julia_init_h_file::Nothing, version::Nothing, soname::Nothing, compat_level::String, extra_precompiles::String, import_into_main::Bool)
@ PackageCompiler ~/.julia/packages/PackageCompiler/UbaS4/src/PackageCompiler.jl:639
[10] create_sysimage
@ ~/.julia/packages/PackageCompiler/UbaS4/src/PackageCompiler.jl:527 [inlined]
[11] create_app(package_dir::String, app_dir::String; executables::Nothing, precompile_execution_file::String, precompile_statements_file::Vector{String}, incremental::Bool, filter_stdlibs::Bool, force::Bool, c_driver_program::String, cpu_target::String, include_lazy_artifacts::Bool, sysimage_build_args::Cmd, include_transitive_dependencies::Bool, include_preferences::Bool, script::Nothing)
@ PackageCompiler ~/.julia/packages/PackageCompiler/UbaS4/src/PackageCompiler.jl:886
[12] top-level scope
@ ~/Desktop/mwe/setup.jl:9

versioninfo()


Julia Version 1.10.8
Commit 4c16ff44be8 (2025-01-22 10:06 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × AMD Ryzen 9 5950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)

julia> MadNLPGPU.CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.4
NVIDIA driver 550.120.0

CUDA libraries:

  • CUBLAS: 12.6.4
  • CURAND: 10.3.7
  • CUFFT: 11.3.0
  • CUSOLVER: 11.7.1
  • CUSPARSE: 12.5.4
  • CUPTI: 2024.3.2 (API 24.0.0)
  • NVML: 12.0.0+550.120

Julia packages:

  • CUDA: 5.6.1
  • CUDA_Driver_jll: 0.10.4+0
  • CUDA_Runtime_jll: 0.15.5+0

Toolchain:

  • Julia: 1.10.8
  • LLVM: 15.0.7

1 device:
0: NVIDIA GeForce RTX 2070 (sm_75, 7.388 GiB / 8.000 GiB available)

I’d say that this Is definitely not simple! It doesn’t surprise me that PackageCompiler struggles.

I don’t think you’ll get much help on Discourse. Perhaps open an issue: GitHub - JuliaLang/PackageCompiler.jl: Compile your Julia Package

I have created an issue on PackageCompiler #1030.

Are there known or roughly known limitations to PackageCompiler’s ability to compile a package? Is it equivalent to limitations of sysimages?

1 Like

I have tried to narrow down MWE. Only latest PackageCompiler and CUDA are installed on Julia 1.11.3.

@maleadt do you have suggestions? Should I open an issue on CUDA.jl?

Generating sysimage of only CUDA fails too.

create_sysimage([“CUDA”], sysimage_path=“image.so” , sysimage_build_args=--compile=all)

Downloaded artifact: mingw-w64
Precompiling Pkg…
23 dependencies successfully precompiled in 28 seconds
Precompiling CUDA_Driver_jll…
3 dependencies successfully precompiled in 1 seconds. 23 already precompiled.
Precompiling project…
78 dependencies successfully precompiled in 60 seconds. 26 already precompiled.
⠋ [10m:09s] PackageCompiler: compiling incremental system imageLLVM ERROR: Cannot select: intrinsic %llvm.nvvm.membar.sys

[38512] signal 22: SIGABRT
in expression starting at none:0
crt_sig_handler at C:/workdir/src\signals-win.c:95
raise at C:\WINDOWS\System32\msvcrt.dll (unknown line)
abort at C:\WINDOWS\System32\msvcrt.dll (unknown line)
_ZN4llvm18report_fatal_errorERKNS_5TwineEb at C:\Users\karlo.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\libLLVM-16jl.dll (unknown line)
_ZN4llvm16SelectionDAGISel15CannotYetSelectEPNS_6SDNodeE at C:\Users\karlo.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\libLLVM-16jl.dll (unknown line)
_ZN4llvm16SelectionDAGISel16SelectCodeCommonEPNS_6SDNodeEPKhj at C:\Users\karlo.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\libLLVM-16jl.dll (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel6SelectEPN4llvm6SDNodeE at C:\Users\karlo.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\libLLVM-16jl.dll (unknown line)
_ZN4llvm16SelectionDAGISel22DoInstructionSelectionEv at C:\Users\karlo.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\libLLVM-16jl.dll (unknown line)
_ZN4llvm16SelectionDAGISel17CodeGenAndEmitDAGEv at C:\Users\karlo.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\libLLVM-16jl.dll (unknown line)
_ZN4llvm16SelectionDAGISel16SelectBasicBlockENS_14ilist_iteratorINS_12ilist_detail12node_optionsINS_11InstructionELb0ELb0EvEELb0ELb1EEES6_Rb at C:\Users\karlo.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\libLLVM-16jl.dll (unknown line)
_ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE at C:\Users\karlo.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\libLLVM-16jl.dll (unknown line)
_ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE at C:\Users\karlo.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\libLLVM-16jl.dll (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel20runOnMachineFunctionERN4llvm15MachineFunctionE at C:\Users\karlo.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\libLLVM-16jl.dll (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at C:\Users\karlo.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\libLLVM-16jl.dll (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at C:\Users\karlo.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\libLLVM-16jl.dll (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at C:\Users\karlo.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\libLLVM-16jl.dll (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at C:\Users\karlo.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\libLLVM-16jl.dll (unknown line)
add_output_impl at C:/workdir/src\aotcompile.cpp:1171
add_output<jl_dump_native_impl(void*, char const*, char const*, char const*, char const*, ios_t*, ios_t*, jl_emission_params_t*)::<lambda(llvm::Module&)> > at C:/workdir/src\aotcompile.cpp:1407
operator()<jl_dump_native_impl(void*, char const*, char const*, char const*, char const*, ios_t*, ios_t*, jl_emission_params_t*)::<lambda(llvm::Module&)> > at C:/workdir/src\aotcompile.cpp:1647 [inlined]
jl_dump_native_impl at C:/workdir/src\aotcompile.cpp:1793
ijl_write_compiler_output at C:/workdir/src\precompile.c:177
ijl_atexit_hook at C:/workdir/src\init.c:285
jl_repl_entrypoint at C:/workdir/src\jlapi.c:1060
mainCRTStartup at C:/workdir/cli\loader_exe.c:58
BaseThreadInitThunk at C:\WINDOWS\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
Allocations: 1616528984 (Pool: 1616519201; Big: 9783); GC: 281
:heavy_multiplication_x: [10m:10s] PackageCompiler: compiling incremental system image
ERROR: failed process: Process('C:\Users\karlo\.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\julia.exe' --color=yes --startup-file=no --pkgimages=no --cpu-target=native --compile=all '--sysimage=C:\Users\karlo\.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\lib\julia\sys.dll' '--project=C:\Users\karlo\.julia\environments\v1.11' '--output-o=C:\Users\karlo\AppData\Local\Temp\jl_KrgDWpWrgR-o.a' 'C:\Users\karlo\AppData\Local\Temp\jl_yhIfmFWHaX', ProcessExited(3)) [3]

Stacktrace:
[1] pipeline_error
@ .\process.jl:598 [inlined]
[2] run(::Cmd; wait::Bool)
@ Base .\process.jl:513
[3] run
@ .\process.jl:510 [inlined]
[4] #20
@ C:\Users\karlo.julia\packages\PackageCompiler\UbaS4\ext\TerminalSpinners.jl:157 [inlined]
[5] spin(f::PackageCompiler.var"#20#22"{Cmd}, s::PackageCompiler.TerminalSpinners.Spinner{Base.TTY})
@ PackageCompiler.TerminalSpinners C:\Users\karlo.julia\packages\PackageCompiler\UbaS4\ext\TerminalSpinners.jl:164
[6] macro expansion
@ C:\Users\karlo.julia\packages\PackageCompiler\UbaS4\ext\TerminalSpinners.jl:157 [inlined]
[7] create_sysimg_object_file(object_file::String, packages::Vector{…}, packages_sysimg::Set{…}; project::String, base_sysimage::String, precompile_execution_file::Vector{…}, precompile_statements_file::Vector{…}, cpu_target::String, script::Nothing, sysimage_build_args::Cmd, extra_precompiles::String, incremental::Bool, import_into_main::Bool)
@ PackageCompiler C:\Users\karlo.julia\packages\PackageCompiler\UbaS4\src\PackageCompiler.jl:130
[8] create_sysimg_object_file
@ C:\Users\karlo.julia\packages\PackageCompiler\UbaS4\src\PackageCompiler.jl:319 [inlined]
[9] create_sysimage(packages::Vector{…}; sysimage_path::String, project::String, precompile_execution_file::Vector{…}, precompile_statements_file::Vector{…}, incremental::Bool, filter_stdlibs::Bool, cpu_target::String, script::Nothing, sysimage_build_args::Cmd, include_transitive_dependencies::Bool, base_sysimage::Nothing, julia_init_c_file::Nothing, julia_init_h_file::Nothing, version::Nothing, soname::Nothing, compat_level::String, extra_precompiles::String, import_into_main::Bool)
@ PackageCompiler C:\Users\karlo.julia\packages\PackageCompiler\UbaS4\src\PackageCompiler.jl:639
[10] top-level scope
@ REPL[4]:1
Some type information was truncated. Use show(err) to see complete types.

julia> versioninfo()
Julia Version 1.11.3
Commit d63adeda50 (2025-01-21 19:42 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 32 × AMD Ryzen 9 5950X 16-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)

julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.4
NVIDIA driver 552.44.0

CUDA libraries:

  • CUBLAS: 12.6.4
  • CURAND: 10.3.7
  • CUFFT: 11.3.0
  • CUSOLVER: 11.7.1
  • CUSPARSE: 12.5.4
  • CUPTI: 2024.3.2 (API 24.0.0)
  • NVML: 12.0.0+552.44

Julia packages:

  • CUDA: 5.6.1
  • CUDA_Driver_jll: 0.10.4+0
  • CUDA_Runtime_jll: 0.15.5+0

Toolchain:

  • Julia: 1.11.3
  • LLVM: 16.0.6

1 device:
0: NVIDIA GeForce RTX 2070 (sm_75, 6.667 GiB / 8.000 GiB available)