Failed while initializing AMDGPU.jl with LLVM 17 and ROCm 6.1 on Fedora 40

The Radeon device has already been bind-mounted into /dev/dri in the container.

Julia is installed via Julia Up, and AMDGPU.jl installed successfully with Pkg.jl .

add AMDGPU

Here is the journal below.

[root@42a30749f58d ~]# julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.4 (2024-06-04)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using AMDGPU
: CommandLine Error: Option 'disassemble' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

[192647] signal (6.-6): Aborted
in expression starting at REPL[1]:1
__pthread_kill_implementation at /lib64/libc.so.6 (unknown line)
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
_ZN4llvm18report_fatal_errorERKNS_5TwineEb at /usr/lib64/llvm17/lib/libLLVM-17.so (unknown line)
_ZN4llvm18report_fatal_errorEPKcb at /usr/lib64/llvm17/lib/libLLVM-17.so (unknown line)
unknown function (ip: 0x7f5849949acb)
_ZN4llvm2cl6Option11addArgumentEv at /usr/lib64/llvm17/lib/libLLVM-17.so (unknown line)
unknown function (ip: 0x7f58541f79f0)
unknown function (ip: 0x7f58541f7d72)
call_init at /usr/src/debug/glibc-2.39-22.fc40.x86_64/elf/dl-init.c:74 [inlined]
call_init at /usr/src/debug/glibc-2.39-22.fc40.x86_64/elf/dl-init.c:26
_dl_init at /usr/src/debug/glibc-2.39-22.fc40.x86_64/elf/dl-init.c:121
_dl_catch_exception at /usr/src/debug/glibc-2.39-22.fc40.x86_64/elf/dl-catch.c:211
dl_open_worker at /usr/src/debug/glibc-2.39-22.fc40.x86_64/elf/dl-open.c:829
_dl_catch_exception at /usr/src/debug/glibc-2.39-22.fc40.x86_64/elf/dl-catch.c:237
_dl_open at /usr/src/debug/glibc-2.39-22.fc40.x86_64/elf/dl-open.c:905
dlopen_doit at /lib64/libc.so.6 (unknown line)
_dl_catch_exception at /usr/src/debug/glibc-2.39-22.fc40.x86_64/elf/dl-catch.c:237
_dl_catch_error at /usr/src/debug/glibc-2.39-22.fc40.x86_64/elf/dl-catch.c:256
_dlerror_run at /lib64/libc.so.6 (unknown line)
dlopen at /lib64/libc.so.6 (unknown line)
ijl_load_dynamic_library at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/dlload.c:365
#dlopen#3 at ./libdl.jl:117
dlopen at ./libdl.jl:116 [inlined]
dlopen at ./libdl.jl:116 [inlined]
dlpath at ./libdl.jl:240
find_rocm_library at /root/.julia/packages/AMDGPU/Xy2Wp/src/discovery/utils.jl:108
#get_library#4 at /root/.julia/packages/AMDGPU/Xy2Wp/src/discovery/discovery.jl:57 [inlined]
get_library at /root/.julia/packages/AMDGPU/Xy2Wp/src/discovery/discovery.jl:48 [inlined]
__init__ at /root/.julia/packages/AMDGPU/Xy2Wp/src/discovery/discovery.jl:136
jfptr___init___5366 at /root/.julia/compiled/v1.10/AMDGPU/arpZD_jyh7w.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_module_run_initializer at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:76
run_module_init at ./loading.jl:1134
register_restored_modules at ./loading.jl:1122
_include_from_serialized at ./loading.jl:1067
_require_search_from_serialized at ./loading.jl:1581
_require at ./loading.jl:1938
__require_prelocked at ./loading.jl:1812
jfptr___require_prelocked_80768.1 at /root/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_in_world at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/builtins.c:831
#invoke_in_world#3 at ./essentials.jl:926 [inlined]
invoke_in_world at ./essentials.jl:923 [inlined]
_require_prelocked at ./loading.jl:1803
macro expansion at ./loading.jl:1790 [inlined]
macro expansion at ./lock.jl:267 [inlined]
__require at ./loading.jl:1753
jfptr___require_80733.1 at /root/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_in_world at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/builtins.c:831
#invoke_in_world#3 at ./essentials.jl:926 [inlined]
invoke_in_world at ./essentials.jl:923 [inlined]
require at ./loading.jl:1746
jfptr_require_80730.1 at /root/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
call_require at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:481 [inlined]
eval_import_path at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:518
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:752
jl_toplevel_eval_flex at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
eval_user_input at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
repl_backend_loop at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
#start_repl_backend#46 at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
start_repl_backend at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:228
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
#run_repl#59 at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
run_repl at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
jfptr_run_repl_91737.1 at /root/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
#1013 at ./client.jl:432
jfptr_YY.1013_82703.1 at /root/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_latest at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:892 [inlined]
invokelatest at ./essentials.jl:889 [inlined]
run_main_repl at ./client.jl:416
exec_options at ./client.jl:333
_start at ./client.jl:552
jfptr__start_82729.1 at /root/.julia/juliaup/julia-1.10.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
__libc_start_call_main at /lib64/libc.so.6 (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 1373793 (Pool: 1372930; Big: 863); GC: 2
Aborted (core dumped)

And the ROCm info returned from rocminfo shown as below:

ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
Runtime Ext Version:     1.4
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 7950X 16-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 7950X 16-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5881                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    65571436(0x3e88a6c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    65571436(0x3e88a6c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    65571436(0x3e88a6c) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-08c10c77bd6b455b               
  Marketing Name:          AMD Radeon RX 7900 XTX             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      6144(0x1800) KB                    
    L3:                      98304(0x18000) KB                  
  Chip ID:                 29772(0x744c)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2526                               
  BDFID:                   768                                
  Internal Node ID:        1                                  
  Compute Unit:            96                                 
  SIMDs per CU:            2                                  
  Shader Engines:          6                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 550                                
  SDMA engine uCode::      19                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    25149440(0x17fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    25149440(0x17fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1100         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32

It seems to be a problem caused by LLVM? But I have no idea with it.

Help!

How come you’re loading Julia 1.10 with llvm 17 which isn’t supported by Julia v1.10, presumably from the system based on the path showing up in the stacktrace? Are you setting something like LD_LIBRARY_PATH or LD_PRELOAD?

I think the problem is not really related to Julia or Julia’s LLVM.

I’m also on Fedora 40, and the issue seems to be related to the various calls to dlopen and dlclose when AMDGPU.jl looks for the ROCm HIP libraries installed in the system. In Julia, these two functions are called in particular by Libdl.find_library and Libdl.dlpath.

This can be verified with the following C code:

// test_libamdhip.c
#include <dlfcn.h>

int main(int argc, char *argv[])
{
    int flags = RTLD_LAZY;
    const char *libpath = "/usr/lib64/libamdhip64.so";  // path to HIP library installed from F40 repos
    void *lib;
    lib = dlopen(libpath, flags);
    dlclose(lib);
    lib = dlopen(libpath, flags);
    dlclose(lib);
    return 0;
}

Running this code fails with the same obscure error as above:

gcc -o test_libamdhip test_libamdhip.c && ./test_libamdhip

: CommandLine Error: Option 'disassemble' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

I’m not sure what this error means. Perhaps it’s a packaging issue, and I guess it should be submitted to Fedora.

In any case, I managed to find a workaround allowing to load AMDGPU.jl, by avoiding the calls to Libdl.find_library and Libdl.dlpath if the ROCm libraries are found in /usr/lib64. This requires modifying the find_rocm_library function by adding this at the beginning:

function find_rocm_library(lib::String, rocm_path::String, ext::String = dlext)
    # Try standard library path in Fedora installation
    path = joinpath("/usr", "lib64", lib * ".$ext")
    isfile(path) && return path
    [...]
end

Error: Option ‘disassemble’ registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

This is likely because of incorrect linking when packaging ROCm for Fedora.
Like linking statically, when you should do so dynamically.

2 Likes

Making some connections: Here’s an issue in the AMDGPU GitHub site for this problem: