Some AMDGPU tests failing on Radeon VII under Windows

I recently installed ROCm/HIP on Windows. I have a Radeon VII installed in my desktop, which isn’t officially supported but I’ve heard that it works for some people. It seems to kind-of work, but I consistently get a small number of failed tests for AMDGPU.jl plus a GUI prompt about a driver timeout from AMD’s software. I’d be grateful if anyone could point me in the right direction. Thanks!

Version info, Pkg details, and basic function check
julia> versioninfo()
Julia Version 1.9.4
Commit 8e5136fa29 (2023-11-14 08:46 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 32 × AMD Ryzen 9 3950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver2)

(@v1.9) pkg> activate .
  Activating project at `C:\Users\x\Dev\amdgpu`

(amdgpu) pkg> st
Status `C:\Users\x\Dev\amdgpu\Project.toml`
  [21141c5a] AMDGPU v0.8.2

julia> using AMDGPU
┌ Warning: MIOpen is unavailable, functionality will be disabled.
└ @ AMDGPU C:\Users\x\.julia\packages\AMDGPU\goZLq\src\AMDGPU.jl:213

julia> AMDGPU.device()
HIPDevice(name="AMD Radeon VII", id=1, gcn_arch=gfx906:sramecc-:xnack-)

julia> AMDGPU.functional()
true

julia> AMDGPU.functional(:MIOpen)
false
Running Pkg.test on AMDGPU
julia> Pkg.test("AMDGPU")
...
     Testing Running tests...
┌ Warning: MIOpen is unavailable, functionality will be disabled.
└ @ AMDGPU C:\Users\x\.julia\packages\AMDGPU\goZLq\src\AMDGPU.jl:213
[ Info: Running following tests: ["core", "hip", "ext", "gpuarrays", "kernelabstractions"].
[ Info: Running tests with 2 workers.
[ Info: Testing using device HIPDevice(name="AMD Radeon VII", id=1, gcn_arch=gfx906:sramecc-:xnack-).
Julia Version 1.9.4
Commit 8e5136fa29 (2023-11-14 08:46 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 32 × AMD Ryzen 9 3950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
  Threads: 16 on 32 virtual cores
Environment:
  JULIA_IMAGE_THREADS = 1
  JULIA_LOAD_PATH = @;C:\Users\x\AppData\Local\Temp\jl_8A1KX1
  JULIA_NUM_THREADS = 16
ROCm provided by: system
[+] ld.lld
    @ C:\Program Files\AMD\ROCm\5.5\bin\ld.lld.exe
[+] ROCm-Device-Libs
    @ C:\Users\x\.julia\artifacts\5ad5ecb46e3c334821f54c1feecc6c152b7b6a45\amdgcn/bitcode
[+] HIP Runtime v5.5.0
    @ C:\WINDOWS\SYSTEM32\amdhip64.DLL
[+] rocBLAS v2.47.0
    @ C:\Program Files\AMD\ROCm\5.5\bin\rocblas.dll
[+] rocSOLVER v3.21.0
    @ C:\Program Files\AMD\ROCm\5.5\bin\rocsolver.dll
[+] rocALUTION
    @ C:\Program Files\AMD\ROCm\5.5\bin\rocalution.dll
[+] rocSPARSE
    @ C:\Program Files\AMD\ROCm\5.5\bin\rocsparse.dll
[+] rocRAND v2.10.5
    @ C:\Program Files\AMD\ROCm\5.5\bin\rocrand.dll
[+] rocFFT v1.0.21
    @ C:\Program Files\AMD\ROCm\5.5\bin\rocfft.dll
[-] MIOpen

HIP Devices [1]
    1. HIPDevice(name="AMD Radeon VII", id=1, gcn_arch=gfx906:sramecc-:xnack-)
[ Info: Scanning for test items in project `AMDGPU` at paths: C:\Users\x\.julia\packages\AMDGPU\goZLq
[ Info: Finished scanning for test items in 0.84 seconds. Scheduling 31 tests on pid 4788 with 2 worker processes and 1 threads per worker.
[ Info: Starting test workers
  Worker 16484:  [ Info: Starting test worker 1 on pid = 16484, with 1 threads
  Worker 20628:  [ Info: Starting test worker 2 on pid = 20628, with 1 threads
[ Info: Starting running test items
  Worker 20628:  15:29:33 | maxrss  0.5% | mem 21.4% | START ( 2/31) test item "gpuarrays - math/power" at test\gpuarrays_tests.jl:51
  Worker 16484:  15:29:33 | maxrss  0.5% | mem 21.4% | START ( 1/31) test item "core" at test\core_tests.jl:1
  Worker 20628:  15:30:26 | maxrss  1.8% | mem 23.0% | DONE  ( 2/31) test item "gpuarrays - math/power" 49.4 secs (93.4% compile, 0.3% recompile, 3.0% GC), 70.79 M allocs (4.736 GB)
  Worker 20628:  15:30:27 | maxrss  1.8% | mem 22.9% | START ( 3/31) test item "gpuarrays - random" at test\gpuarrays_tests.jl:54
  Worker 20628:  15:30:40 | maxrss  1.9% | mem 23.3% | DONE  ( 3/31) test item "gpuarrays - random" 13.5 secs (75.1% compile, 1.7% GC), 23.12 M allocs (1.277 GB)
  Worker 20628:  15:30:40 | maxrss  1.9% | mem 23.2% | START ( 4/31) test item "gpuarrays - reductions/== isequal" at test\gpuarrays_tests.jl:57
  Worker 16484:  15:31:00 | maxrss  2.6% | mem 23.9% | DONE  ( 1/31) test item "core" 82.8 secs (68.7% compile, 2.5% recompile, 2.4% GC), 139.48 M allocs (7.984 GB)
┌ Warning: Test item "core" at test\core_tests.jl:1 contains test sets without tests:
│ "Exception holder"
│ "unsafe_free"
└ @ ReTestItems C:\Users\x\.julia\packages\ReTestItems\euk3Q\src\log_capture.jl:293
  Worker 16484:  15:31:00 | maxrss  2.6% | mem 23.8% | START ( 5/31) test item "core: device" at test\device_tests.jl:1
  Worker 20628:  15:31:18 | maxrss  2.2% | mem 24.4% | DONE  ( 4/31) test item "gpuarrays - reductions/== isequal" 37.3 secs (69.1% compile, 2.4% GC), 65.50 M allocs (3.756 GB)
  Worker 20628:  15:31:18 | maxrss  2.2% | mem 24.4% | START ( 6/31) test item "gpuarrays - reductions/any all count" at test\gpuarrays_tests.jl:60
  Worker 20628:  15:31:24 | maxrss  2.2% | mem 24.2% | DONE  ( 6/31) test item "gpuarrays - reductions/any all count" 6.2 secs (77.5% compile, 3.3% GC), 10.17 M allocs (601.775 MB)
  Worker 20628:  15:31:24 | maxrss  2.2% | mem 24.1% | START ( 7/31) test item "gpuarrays - reductions/mapreduce" at test\gpuarrays_tests.jl:63
  Worker 16484:  15:31:41 | maxrss  3.3% | mem 24.6% | DONE  ( 5/31) test item "core: device" 40.3 secs (46.0% compile, 1.6% GC), 76.16 M allocs (3.358 GB)

Captured Logs for test item "core: device" at test\device_tests.jl:1 on worker 16484
┌ Warning: Only 1 GPU detected; skipping multi-GPU tests
└ @ Main.var"##core: device#2651" C:\Users\x\.julia\packages\AMDGPU\goZLq\test\device\launch.jl:86
Error in testset "across launches" on worker 16484:
Test Failed at C:\Users\x\.julia\packages\AMDGPU\goZLq\test\device\random.jl:42
  Expression: Array(a) == Array(b)
   Evaluated: Int32[1188096096, 268782523, -864893721, -1747302721, 1290285559, -1983434281, 90721520, 1466291455, -1457593152, -25908865  …  -122289872, 496392684, -2053886717, -389617061, -1829329209, 1725311088, -235708307, -767842514, 1032946595, -1683738882] == Int32[1188096096, 268782523, -864893721, -1747302721, 1290285559, -1983434281, 90721520, 1466291455, -1457593152, -25908865  …  -1163248307, 535429121, 136629444, 621628142, 1491040532, 1507651498, 1533262142, -343588899, -320220045, 142327400]

Error in testset "across launches" on worker 16484:
Test Failed at C:\Users\x\.julia\packages\AMDGPU\goZLq\test\device\random.jl:42
  Expression: Array(a) == Array(b)
   Evaluated: UInt64[0x10cdbe65e0e13fcc, 0xa41199401de63784, 0xd19aac63755002bf, 0x755ffe34a6153183, 0xcd7c12d5b4d7af67, 0x3bd1c21c6052dd85, 0x61a8258c4464c350, 0xfa5d053a1efa1309, 0xc6496999909209ff, 0x1ce41a2539310fb4  …  0x48af35a7f8b60130, 0x01b7e51a1d9659ec, 0x8d21617385942d03, 0x5e159565e8c6ea5b, 0x0b368aab92f6a6c7, 0x3d2934f266d62870, 0x0090fee0f1f3606d, 0x7046a1b2d23ba72e, 0x4e18b7bd3d9183a3, 0xc62414389ba42efe] == UInt64[0x0df760da46d0e860, 0x2bea83fa10054bbb, 0xcc219546cc72c4e7, 0x37fbb94897da46bf, 0x13db8ffe4ce831f7, 0x64f97e2089c731d7, 0xb1a5435905684cf0, 0x78de52f15765d4ff, 0xa43aa3c0a91ee4c0, 0x4c3c4065fe74a97f  …  0x75090a4cbaaa3d4d, 0xc3dc27171fea0001, 0x44b624450824ccc4, 0xf33c3f9d250d4aee, 0x88ed217858df7914, 0x29ad647459dcefaa, 0x1ec5413b5b63b93e, 0x80d13ec0eb853fdd, 0x25fd339cece9d473, 0xb59c2bea087bbe68]

Error in testset "across launches" on worker 16484:
Test Failed at C:\Users\x\.julia\packages\AMDGPU\goZLq\test\device\random.jl:42
  Expression: Array(a) == Array(b)
   Evaluated: Float16[0.09375, 0.9326, 0.2256, 0.6865, 0.4912, 0.46, 0.2344, 0.249, 0.1875, 0.374  …  0.2969, 0.4805, 0.253, 0.589, 0.6943, 0.1094, 0.10645, 0.795, 0.909, 0.748] == Float16[0.09375, 0.9326, 0.2256, 0.6865, 0.4912, 0.46, 0.2344, 0.249, 0.1875, 0.374  …  0.3252, 0.000977, 0.1914, 0.7324, 0.2695, 0.916, 0.3105, 0.966, 0.1123, 0.6016]

Error in testset "across launches" on worker 16484:
Test Failed at C:\Users\x\.julia\packages\AMDGPU\goZLq\test\device\random.jl:42
  Expression: Array(a) == Array(b)
   Evaluated: Float32[0.6320915, 0.04137361, 0.896634, 0.705284, 0.8140248, 0.5562085, 0.81484795, 0.7955626, 0.2413559, 0.9114226  …  0.42191124, 0.1746192, 0.15762365, 0.55402696, 0.92696464, 0.67310905, 0.9013802, 0.46603942, 0.13682973, 0.2826841] == Float32[0.6320915, 0.04137361, 0.896634, 0.705284, 0.8140248, 0.5562085, 0.81484795, 0.7955626, 0.2413559, 0.9114226  …  0.32999575, 0.8281251, 0.28749895, 0.10384917, 0.7458825, 0.72606397, 0.77909064, 0.041011453, 0.82679594, 0.96674824]

Error in testset "across launches" on worker 16484:
Test Failed at C:\Users\x\.julia\packages\AMDGPU\goZLq\test\device\random.jl:42
  Expression: Array(a) == Array(b)
   Evaluated: [0.4611456648113048, 0.6572209001358107, 0.09894447195648026, 0.7327352458709553, 0.7226546291523144, 0.5932927495915601, 0.3289423190333842, 0.8952496923423607, 0.6649786573673708, 0.7657222690273497  …  0.9505996432563499, 0.49343310887854397, 0.08629181079123849, 0.34897414139290706, 0.4088550320397404, 0.5754264847805892, 0.0622262431334335, 0.41447717783234905, 0.5448582081810194, 0.2549367980709003] == [0.4611456648113048, 0.6572209001358107, 0.09894447195648026, 0.7327352458709553, 0.7226546291523144, 0.5932927495915601, 0.3289423190333842, 0.8952496923423607, 0.6649786573673708, 0.7657222690273497  …  0.5650145808793525, 0.759543537773425, 0.38385489636566295, 0.765530724271851, 0.8206714126204746, 0.8370250234865844, 0.32842574786225454, 0.07782070160100507, 0.8251008276395424, 0.7607212382680704]


┌ Warning: Test item "core: device" at test\device_tests.jl:1 contains test sets without tests:
│ "Kernel argument alignment"
└ @ ReTestItems C:\Users\x\.julia\packages\ReTestItems\euk3Q\src\log_capture.jl:293
  Worker 16484:  15:31:41 | maxrss  3.3% | mem 24.5% | START ( 8/31) test item "ext" at test\external_tests.jl:1
  Worker 16484:  15:31:46 | maxrss  3.3% | mem 24.6% | DONE  ( 8/31) test item "ext" 4.9 secs (61.6% compile, 3.1% GC), 11.55 M allocs (649.657 MB)
┌ Warning: Test item "ext" at test\external_tests.jl:1 contains test sets without tests:
│ "UNARY"
└ @ ReTestItems C:\Users\x\.julia\packages\ReTestItems\euk3Q\src\log_capture.jl:293
  Worker 16484:  15:31:46 | maxrss  3.3% | mem 24.6% | START ( 9/31) test item "gpuarrays - base" at test\gpuarrays_tests.jl:12
  Worker 16484:  15:32:10 | maxrss  3.6% | mem 25.0% | DONE  ( 9/31) test item "gpuarrays - base" 22.9 secs (86.5% compile, 0.1% recompile, 4.7% GC), 47.14 M allocs (2.812 GB)
┌ Warning: Test item "gpuarrays - base" at test\gpuarrays_tests.jl:12 contains test sets without tests:
│ "cartesian iteration"
└ @ ReTestItems C:\Users\x\.julia\packages\ReTestItems\euk3Q\src\log_capture.jl:293
  Worker 16484:  15:32:11 | maxrss  3.6% | mem 24.9% | START (10/31) test item "gpuarrays - broadcasting" at test\gpuarrays_tests.jl:15
  Worker 20628:  15:33:29 | maxrss  2.9% | mem 26.1% | DONE  ( 7/31) test item "gpuarrays - reductions/mapreduce" 124.7 secs (84.8% compile, 5.7% GC), 448.43 M allocs (20.907 GB)
  Worker 20628:  15:33:29 | maxrss  2.9% | mem 26.0% | START (11/31) test item "gpuarrays - reductions/mapreducedim!" at test\gpuarrays_tests.jl:66
  Worker 20628:  15:34:21 | maxrss  3.5% | mem 27.2% | DONE  (11/31) test item "gpuarrays - reductions/mapreducedim!" 52.2 secs (64.4% compile, 2.7% GC), 75.03 M allocs (3.884 GB)
  Worker 20628:  15:34:22 | maxrss  3.5% | mem 27.1% | START (12/31) test item "gpuarrays - reductions/mapreducedim!_large" at test\gpuarrays_tests.jl:69
  Worker 20628:  15:34:27 | maxrss  3.7% | mem 27.4% | DONE  (12/31) test item "gpuarrays - reductions/mapreducedim!_large" 5.4 secs (33.7% compile, 7.0% GC), 6.06 M allocs (1.971 GB)
┌ Warning: Test item "gpuarrays - reductions/mapreducedim!_large" at test\gpuarrays_tests.jl:69 contains test sets without tests:
│ "Float16"
│ "ComplexF16"
└ @ ReTestItems C:\Users\x\.julia\packages\ReTestItems\euk3Q\src\log_capture.jl:293
  Worker 20628:  15:34:27 | maxrss  3.7% | mem 27.2% | START (13/31) test item "gpuarrays - reductions/minimum maximum extrema" at test\gpuarrays_tests.jl:72
  Worker 16484:  15:34:32 | maxrss  4.9% | mem 27.3% | DONE  (10/31) test item "gpuarrays - broadcasting" 141.1 secs (72.8% compile, <0.1% recompile, 4.0% GC), 225.62 M allocs (12.652 GB)
┌ Warning: Test item "gpuarrays - broadcasting" at test\gpuarrays_tests.jl:15 contains test sets without tests:
│ "stackoverflow in copy(::Broadcast)"
└ @ ReTestItems C:\Users\x\.julia\packages\ReTestItems\euk3Q\src\log_capture.jl:293
  Worker 16484:  15:34:32 | maxrss  4.9% | mem 27.3% | START (14/31) test item "gpuarrays - constructors" at test\gpuarrays_tests.jl:18
  Worker 16484:  15:34:50 | maxrss  5.0% | mem 27.6% | DONE  (14/31) test item "gpuarrays - constructors" 17.9 secs (82.7% compile, 0.6% recompile, 3.8% GC), 26.08 M allocs (1.474 GB)
  Worker 16484:  15:34:50 | maxrss  5.0% | mem 27.5% | START (15/31) test item "gpuarrays - indexing find" at test\gpuarrays_tests.jl:21
  Worker 16484:  15:35:07 | maxrss  5.3% | mem 28.0% | DONE  (15/31) test item "gpuarrays - indexing find" 16.4 secs (75.1% compile, 5.5% GC), 29.35 M allocs (1.628 GB)
  Worker 16484:  15:35:07 | maxrss  5.3% | mem 27.9% | START (16/31) test item "gpuarrays - indexing multidimensional" at test\gpuarrays_tests.jl:24
  Worker 16484:  15:35:07 | maxrss  5.3% | mem 27.9% | DONE  (16/31) test item "gpuarrays - indexing multidimensional" <0.1 secs, 1.91 K allocs (135.048 KB)
┌ Warning: Test item "gpuarrays - indexing multidimensional" at test\gpuarrays_tests.jl:24 contains test sets without tests:
│ "gpuarrays - indexing multidimensional"
└ @ ReTestItems C:\Users\x\.julia\packages\ReTestItems\euk3Q\src\log_capture.jl:293
  Worker 16484:  15:35:07 | maxrss  5.3% | mem 27.9% | START (17/31) test item "gpuarrays - indexing scalar" at test\gpuarrays_tests.jl:30
  Worker 16484:  15:35:14 | maxrss  5.3% | mem 28.1% | DONE  (17/31) test item "gpuarrays - indexing scalar" 7.1 secs (77.5% compile, 3.5% GC), 9.88 M allocs (530.402 MB)
  Worker 16484:  15:35:15 | maxrss  5.3% | mem 28.1% | START (18/31) test item "gpuarrays - interface" at test\gpuarrays_tests.jl:33
  Worker 16484:  15:35:16 | maxrss  5.3% | mem 28.1% | DONE  (18/31) test item "gpuarrays - interface" 1.5 secs (71.3% compile, 2.8% GC), 2.58 M allocs (125.489 MB)
  Worker 16484:  15:35:17 | maxrss  5.3% | mem 28.1% | START (19/31) test item "gpuarrays - linalg" at test\gpuarrays_tests.jl:36
  Worker 16484:  15:36:15 | maxrss  6.0% | mem 29.6% | DONE  (19/31) test item "gpuarrays - linalg" 57.9 secs (75.9% compile, 4.4% GC), 85.23 M allocs (4.943 GB)
┌ Warning: Test item "gpuarrays - linalg" at test\gpuarrays_tests.jl:36 contains test sets without tests:
│ "Hermitian"
└ @ ReTestItems C:\Users\x\.julia\packages\ReTestItems\euk3Q\src\log_capture.jl:293
  Worker 16484:  15:36:15 | maxrss  6.0% | mem 29.5% | START (20/31) test item "gpuarrays - linalg/mul!/matrix-matrix" at test\gpuarrays_tests.jl:39
  Worker 20628:  15:36:33 | maxrss  5.0% | mem 29.9% | DONE  (13/31) test item "gpuarrays - reductions/minimum maximum extrema" 125.6 secs (62.5% compile, 3.9% GC), 187.69 M allocs (10.177 GB)
┌ Warning: Test item "gpuarrays - reductions/minimum maximum extrema" at test\gpuarrays_tests.jl:72 contains test sets without tests:
│ "ComplexF16"
│ "ComplexF32"
│ "ComplexF64"
│ "Complex{Int16}"
│ "Complex{Int32}"
│ "Complex{Int64}"
└ @ ReTestItems C:\Users\x\.julia\packages\ReTestItems\euk3Q\src\log_capture.jl:293
  Worker 20628:  15:36:33 | maxrss  5.0% | mem 29.8% | START (21/31) test item "gpuarrays - reductions/reduce" at test\gpuarrays_tests.jl:75
  Worker 20628:  15:36:40 | maxrss  5.0% | mem 29.9% | DONE  (21/31) test item "gpuarrays - reductions/reduce" 7.0 secs (95.7% compile, 7.3% GC), 18.77 M allocs (1.160 GB)
  Worker 20628:  15:36:41 | maxrss  5.0% | mem 29.8% | START (22/31) test item "gpuarrays - reductions/reducedim!" at test\gpuarrays_tests.jl:78
  Worker 20628:  15:36:41 | maxrss  5.0% | mem 29.9% | DONE  (22/31) test item "gpuarrays - reductions/reducedim!" 0.7 secs (73.5% compile, 2.3% GC), 995.57 K allocs (52.384 MB)
  Worker 20628:  15:36:42 | maxrss  5.0% | mem 29.8% | START (23/31) test item "gpuarrays - reductions/sum prod" at test\gpuarrays_tests.jl:81
  Worker 16484:  15:37:32 | maxrss  6.8% | mem 31.3% | DONE  (20/31) test item "gpuarrays - linalg/mul!/matrix-matrix" 76.6 secs (74.6% compile, 4.3% GC), 110.21 M allocs (6.278 GB)
  Worker 16484:  15:37:32 | maxrss  6.8% | mem 31.2% | START (24/31) test item "gpuarrays - linalg/mul!/vector-matrix" at test\gpuarrays_tests.jl:42
  Worker 20628:  15:38:09 | maxrss  6.0% | mem 31.9% | DONE  (23/31) test item "gpuarrays - reductions/sum prod" 87.0 secs (59.6% compile, 3.0% GC), 110.41 M allocs (5.478 GB)
  Worker 20628:  15:38:09 | maxrss  6.0% | mem 31.9% | START (25/31) test item "gpuarrays - statistics" at test\gpuarrays_tests.jl:84
  Worker 16484:  15:38:11 | maxrss  7.1% | mem 31.9% | DONE  (24/31) test item "gpuarrays - linalg/mul!/vector-matrix" 39.3 secs (75.9% compile, 4.2% GC), 51.82 M allocs (2.912 GB)
  Worker 16484:  15:38:12 | maxrss  7.1% | mem 31.9% | START (26/31) test item "gpuarrays - linalg/norm" at test\gpuarrays_tests.jl:45
  Worker 16484:  15:40:32 | maxrss  8.7% | mem 34.3% | DONE  (26/31) test item "gpuarrays - linalg/norm" 140.2 secs (50.7% compile, 3.5% GC), 166.47 M allocs (8.450 GB)
  Worker 16484:  15:40:32 | maxrss  8.7% | mem 34.3% | START (27/31) test item "gpuarrays - math/intrinsics" at test\gpuarrays_tests.jl:48
  Worker 16484:  15:40:35 | maxrss  8.7% | mem 34.3% | DONE  (27/31) test item "gpuarrays - math/intrinsics" 2.0 secs (51.6% compile, 2.8% GC), 2.16 M allocs (105.301 MB)
  Worker 16484:  15:40:35 | maxrss  8.7% | mem 34.3% | START (28/31) test item "gpuarrays - uniformscaling" at test\gpuarrays_tests.jl:87
  Worker 16484:  15:40:43 | maxrss  8.7% | mem 34.3% | DONE  (28/31) test item "gpuarrays - uniformscaling" 8.4 secs (52.2% compile, 2.3% GC), 5.41 M allocs (307.734 MB)
  Worker 16484:  15:40:44 | maxrss  8.7% | mem 34.3% | START (29/31) test item "hip - core" at test\hip_core_tests.jl:1
  Worker 16484:  15:41:00 | maxrss  8.8% | mem 34.4% | DONE  (29/31) test item "hip - core" 16.7 secs (39.8% compile, 0.3% recompile, 1.5% GC), 8.02 M allocs (499.942 MB)
  Worker 16484:  15:41:01 | maxrss  8.8% | mem 34.4% | START (30/31) test item "hip - extra" at test\hip_extra_tests.jl:1
  Worker 16484:  15:44:59 | maxrss 10.0% | mem 35.6% | DONE  (30/31) test item "hip - extra" 237.7 secs (45.6% compile, <0.1% recompile, 2.7% GC), 149.44 M allocs (9.223 GB)
  Worker 16484:  15:44:59 | maxrss 10.0% | mem 35.6% | START (31/31) test item "kernelabstractions" at test\ka_tests.jl:1
  Worker 16484:  15:45:25 | maxrss 10.2% | mem 35.8% | DONE  (31/31) test item "kernelabstractions" 26.0 secs (60.8% compile, 0.7% recompile, 4.5% GC), 26.79 M allocs (1.633 GB)
┌ Warning: Test item "kernelabstractions" at test\ka_tests.jl:1 contains test sets without tests:
│ "CPU synchronization"
│ "Zero iteration space AMDGPU.ROCKernels.ROCBackend"
│ "Unroll"
└ @ ReTestItems C:\Users\x\.julia\packages\ReTestItems\euk3Q\src\log_capture.jl:293
  Worker 20628:  15:55:40 | maxrss  6.7% | mem 25.2% | DONE  (25/31) test item "gpuarrays - statistics" 1051.3 secs (3.2% compile, 0.3% GC), 129.15 M allocs (6.210 GB)

No Captured Logs for test item "gpuarrays - statistics" at test\gpuarrays_tests.jl:84 on worker 20628
Error in testset "cor" on worker 20628:
Test Failed at C:\Users\x\.julia\packages\GPUArrays\dAUOE\test\testsuite\statistics.jl:55
  Expression: compare((A->begin
            cor(A; dims = 2)
        end), AT, rand(ET, s, 2), nans = true)

Continued due to character limit:

Pkg.test Results Summary
Test Summary:                                        |  Pass  Fail  Broken  Total      Time
AMDGPU                                               | 11895     6      20  11921  26m13.1s
  test                                               | 11895     6      20  11921
    test\core_tests.jl                               |   604             2    606
    test\device_tests.jl                             |   454     5      10    469
      core: device                                   |   454     5      10    469     40.3s
        Launch Options                               |     6                    6      0.6s
        Kernel argument alignment                    |                       None      0.5s
        Function/Argument Conversion                 |     1                    1      0.3s
        Launch Configuration                         |     3                    3      0.1s
        ROCDeviceArray                               |    12                   12      0.2s
        Vector Addition Kernel                       |     1                    1      0.3s
        Memory: Static                               |     4                    4      0.5s
        Memcpy/Memset                                |     3                    3      0.7s
        Kernel Indexing                              |     2                    2      0.4s
        Wavefront Operations                         |    66             9     75     11.8s
        Wavefront Information                        |     2                    2      0.2s
        Workgroup synchronization                    |   176                  176      0.8s
        Execution Control Intrinsics                 |     5                    5      0.3s
        Exceptions                                   |     1                    1      0.3s
        rand(Int32), seed nothing                    |     8                    8      0.8s
        rand(Int32), seed 1234                       |     7     1              8      0.7s
          across launches                            |           1              1      0.3s
          across calls                               |     1                    1      0.2s
          across threads                             |     1                    1      0.2s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
        rand(UInt32), seed nothing                   |     8                    8      0.7s
        rand(UInt32), seed 1234                      |     8                    8      0.6s
        rand(Int64), seed nothing                    |     8                    8      0.7s
        rand(Int64), seed 1234                       |     8                    8      0.6s
        rand(UInt64), seed nothing                   |     8                    8      0.8s
        rand(UInt64), seed 1234                      |     7     1              8      0.7s
          across launches                            |           1              1      0.2s
          across calls                               |     1                    1      0.2s
          across threads                             |     1                    1      0.2s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
        rand(Int128), seed nothing                   |     8                    8      1.0s
        rand(Int128), seed 1234                      |     8                    8      0.7s
        rand(UInt128), seed nothing                  |     8                    8      1.0s
        rand(UInt128), seed 1234                     |     8                    8      0.7s
        rand(Float16), seed nothing                  |     8                    8      0.9s
        rand(Float16), seed 1234                     |     7     1              8      0.6s
          across launches                            |           1              1      0.2s
          across calls                               |     1                    1      0.2s
          across threads                             |     1                    1      0.2s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
        rand(Float32), seed nothing                  |     8                    8      0.8s
        rand(Float32), seed 1234                     |     7     1              8      0.6s
          across launches                            |           1              1      0.2s
          across calls                               |     1                    1      0.2s
          across threads                             |     1                    1      0.2s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
        rand(Float64), seed nothing                  |     8                    8      0.6s
        rand(Float64), seed 1234                     |     7     1              8      0.6s
          across launches                            |           1              1      0.2s
          across calls                               |     1                    1      0.2s
          across threads                             |     1                    1      0.2s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
          across threads                             |     1                    1      0.0s
        basic randn(Float16), seed nothing           |     1                    1      0.8s
        basic randn(Float16), seed 1234              |     1                    1      0.4s
        basic randn(Float32), seed nothing           |     1                    1      0.4s
        basic randn(Float32), seed 1234              |     1                    1      0.4s
        basic randn(Float64), seed nothing           |     1                    1      0.4s
        basic randn(Float64), seed 1234              |     1                    1      0.4s
        basic randexp(Float16), seed nothing         |     1                    1      0.5s
        basic randexp(Float16), seed 1234            |     1                    1      0.4s
        basic randexp(Float32), seed nothing         |     1                    1      0.4s
        basic randexp(Float32), seed 1234            |     1                    1      0.4s
        basic randexp(Float64), seed nothing         |     1                    1      0.4s
        basic randexp(Float64), seed 1234            |     1                    1      0.4s
        Math Intrinsics                              |    21                   21      4.9s
    test\external_tests.jl                           |    18                   18
    test\gpuarrays_tests.jl                          |  6977     1           6978
      gpuarrays - base                               |    95                   95     24.4s
      gpuarrays - broadcasting                       |   400                  400   2m21.1s
      gpuarrays - constructors                       |   942                  942     17.9s
      gpuarrays - indexing find                      |    45                   45     16.4s
      gpuarrays - indexing multidimensional          |                       None      0.0s
      gpuarrays - indexing scalar                    |   477                  477      7.1s
      gpuarrays - interface                          |     7                    7      1.5s
      gpuarrays - linalg                             |   275                  275     57.9s
      gpuarrays - linalg/mul!/matrix-matrix          |   432                  432   1m16.6s
      gpuarrays - linalg/mul!/vector-matrix          |   168                  168     39.3s
      gpuarrays - linalg/norm                        |   696                  696   2m20.2s
      gpuarrays - math/intrinsics                    |    12                   12      2.0s
      gpuarrays - math/power                         |    72                   72     53.0s
      gpuarrays - random                             |    62                   62     13.5s
      gpuarrays - reductions/== isequal              |   312                  312     37.3s
      gpuarrays - reductions/any all count           |   101                  101      6.2s
      gpuarrays - reductions/mapreduce               |   396                  396   2m04.7s
      gpuarrays - reductions/mapreducedim!           |   312                  312     52.2s
      gpuarrays - reductions/mapreducedim!_large     |    50                   50      5.4s
      gpuarrays - reductions/minimum maximum extrema |   666                  666   2m05.6s
      gpuarrays - reductions/reduce                  |   264                  264      7.0s
      gpuarrays - reductions/reducedim!              |   192                  192      0.7s
      gpuarrays - reductions/sum prod                |   862                  862   1m27.0s
      gpuarrays - statistics                         |    83     1             84  17m31.3s
        std                                          |     4                    4      3.0s
        var                                          |     8                    8      2.2s
        mean                                         |     6                    6      2.9s
        std                                          |     4                    4      2.8s
        var                                          |     8                    8      2.1s
        mean                                         |     6                    6      2.6s
        std                                          |     4                    4      2.8s
        var                                          |     8                    8      2.2s
        mean                                         |     6                    6      2.7s
        cov                                          |     4                    4      1.4s
        cor                                          |     4                    4      2.2s
        cov                                          |     4                    4      1.9s
        cor                                          |     4                    4      2.0s
        cov                                          |     4                    4      1.5s
        cor                                          |     4                    4      2.0s
        cov                                          |     3                    3      2.5s
        cor                                          |     2     1              3  16m48.1s
      gpuarrays - uniformscaling                     |    56                   56      8.4s
    test\hip_core_tests.jl                           |   460                  460
    test\hip_extra_tests.jl                          |  1233                 1233
    test\ka_tests.jl                                 |  2149             8   2157
ERROR: LoadError: Some tests did not pass: 11895 passed, 6 failed, 0 errored, 20 broken.
in expression starting at C:\Users\x\.julia\packages\AMDGPU\goZLq\test\runtests.jl:105
ERROR: Package AMDGPU errored during testing

It isn’t going to work until you get MIOpen working:

For me it turned out the package manager downgraded me to an older version of AMDGPU, but yours looks like it is v0.8.2.

Try running:

julia> AMDGPU.versioninfo()
ROCm provided by: system
[+] HSA Runtime v1.1.0
    @ /opt/rocm-5.7.1/lib/libhsa-runtime64.so
[+] ld.lld
    @ /opt/rocm/llvm/bin/ld.lld
[+] ROCm-Device-Libs
    @ /home/user1/.julia/artifacts/5ad5ecb46e3c334821f54c1feecc6c152b7b6a45/amdgcn/bitcode
[+] HIP Runtime v5.7.31921
    @ /opt/rocm-5.7.1/lib/libamdhip64.so
[+] rocBLAS v3.1.0
    @ /opt/rocm-5.7.1/lib/librocblas.so
[+] rocSOLVER v3.23.0
    @ /opt/rocm-5.7.1/lib/librocsolver.so
[+] rocALUTION
    @ /opt/rocm-5.7.1/lib/librocalution.so
[+] rocSPARSE
    @ /opt/rocm-5.7.1/lib/librocsparse.so.0
[+] rocRAND v2.10.5
    @ /opt/rocm-5.7.1/lib/librocrand.so
[+] rocFFT v1.0.21
    @ /opt/rocm-5.7.1/lib/librocfft.so
[+] MIOpen v2.20.0
    @ /opt/rocm-5.7.1/lib/libMIOpen.so

HIP Devices [2]
    1. HIPDevice(name="AMD Radeon VII", id=1, gcn_arch=gfx906:sramecc+:xnack-)
    2. HIPDevice(name="Radeon RX 580 Series", id=2, gcn_arch=gfx803)
1 Like

Can you switch to Julia 1.10 and try again?
Tests that fail are related to device side RNG, which are known to fail on 1.9, I haven’t looked at fixing them on 1.9.

1 Like

Just updated AMDGPU.jl from v0.8.2 to v0.8.3. I’m currently using ROCm/HIP 5.5; as far as I can tell that seems to be the most recent (only?) version with a Windows installer available.

AMDGPU update and versioninfo
(@v1.9) pkg> activate .
  Activating project at `C:\Users\x\Dev\amdgpu`

(amdgpu) pkg> st
Status `C:\Users\x\Dev\amdgpu\Project.toml`
  [21141c5a] AMDGPU v0.8.2

(amdgpu) pkg> up
    Updating registry at `C:\Users\x\.julia\registries\General.toml`
   Installed AMDGPU ─ v0.8.3
    Updating `C:\Users\x\Dev\amdgpu\Project.toml`
  [21141c5a] ↑ AMDGPU v0.8.2 ⇒ v0.8.3
    Updating `C:\Users\x\Dev\amdgpu\Manifest.toml`
  [21141c5a] ↑ AMDGPU v0.8.2 ⇒ v0.8.3
  [fa961155] ↑ CEnum v0.4.2 ⇒ v0.5.0
Precompiling project...
  6 dependencies successfully precompiled in 41 seconds. 68 already precompiled.

julia> using AMDGPU
┌ Warning: MIOpen is unavailable, functionality will be disabled.
└ @ AMDGPU C:\Users\x\.julia\packages\AMDGPU\FdIJi\src\AMDGPU.jl:213

julia> AMDGPU.versioninfo()
ROCm provided by: system
[+] ld.lld
    @ C:\Program Files\AMD\ROCm\5.5\bin\ld.lld.exe
[+] ROCm-Device-Libs
    @ C:\Users\x\.julia\artifacts\5ad5ecb46e3c334821f54c1feecc6c152b7b6a45\amdgcn/bitcode
[+] HIP Runtime v5.5.0
    @ C:\WINDOWS\SYSTEM32\amdhip64.DLL
[+] rocBLAS v2.47.0
    @ C:\Program Files\AMD\ROCm\5.5\bin\rocblas.dll
[+] rocSOLVER v3.21.0
    @ C:\Program Files\AMD\ROCm\5.5\bin\rocsolver.dll
[+] rocALUTION
    @ C:\Program Files\AMD\ROCm\5.5\bin\rocalution.dll
[+] rocSPARSE
    @ C:\Program Files\AMD\ROCm\5.5\bin\rocsparse.dll
[+] rocRAND v2.10.5
    @ C:\Program Files\AMD\ROCm\5.5\bin\rocrand.dll
[+] rocFFT v1.0.21
    @ C:\Program Files\AMD\ROCm\5.5\bin\rocfft.dll
[-] MIOpen

HIP Devices [1]
    1. HIPDevice(name="AMD Radeon VII", id=1, gcn_arch=gfx906:sramecc-:xnack-)

Looks like I don’t have MIOpen installed? I installed ROCm/HIP/etc using the official bundled installer here, but I’m not finding a separate Windows installer for MIOpen. Does it need to be built from source?

I noticed here in the AMDGPU docs that it might be possible to run purely with package artifacts vs installing them separately, so I gave that a shot but it definitely doesn’t work for me.

Failed attempt with use_artifacts!(true)
julia> AMDGPU.ROCmDiscovery.use_artifacts!(true)
┌ Info: Switched `use_artifacts` to `true`.
└ Restart Julia session for the changes to take effect.

# RESTARTED JULIA

julia> using AMDGPU
[ Info: Precompiling AMDGPU [21141c5a-9bdb-4563-92ae-f87d6854732e]
┌ Warning: HIP library is unavailable, HIP integration will be disabled.
└ @ AMDGPU C:\Users\x\.julia\packages\AMDGPU\FdIJi\src\AMDGPU.jl:197
┌ Warning: rocBLAS is unavailable, functionality will be disabled.
└ @ AMDGPU C:\Users\x\.julia\packages\AMDGPU\FdIJi\src\AMDGPU.jl:213
┌ Warning: rocSPARSE is unavailable, functionality will be disabled.
└ @ AMDGPU C:\Users\x\.julia\packages\AMDGPU\FdIJi\src\AMDGPU.jl:213
┌ Warning: rocSOLVER is unavailable, functionality will be disabled.
└ @ AMDGPU C:\Users\x\.julia\packages\AMDGPU\FdIJi\src\AMDGPU.jl:213
┌ Warning: rocALUTION is unavailable, functionality will be disabled.
└ @ AMDGPU C:\Users\x\.julia\packages\AMDGPU\FdIJi\src\AMDGPU.jl:213
┌ Warning: rocRAND is unavailable, functionality will be disabled.
└ @ AMDGPU C:\Users\x\.julia\packages\AMDGPU\FdIJi\src\AMDGPU.jl:213
┌ Warning: rocFFT is unavailable, functionality will be disabled.
└ @ AMDGPU C:\Users\x\.julia\packages\AMDGPU\FdIJi\src\AMDGPU.jl:213
┌ Warning: MIOpen is unavailable, functionality will be disabled.
└ @ AMDGPU C:\Users\x\.julia\packages\AMDGPU\FdIJi\src\AMDGPU.jl:213

julia> AMDGPU.versioninfo()
ROCm provided by: JLLs
[+] ld.lld
    @ C:\Users\x\.julia\juliaup\julia-1.9.4+0.x64.w64.mingw32\libexec\julia\lld.exe
[+] ROCm-Device-Libs
    @ C:\Users\x\.julia\artifacts\5ad5ecb46e3c334821f54c1feecc6c152b7b6a45\amdgcn/bitcode
[-] HIP Runtime
[-] rocBLAS
[-] rocSOLVER
[-] rocALUTION
[-] rocSPARSE
[-] rocRAND
[-] rocFFT
[-] MIOpen

I’ll give it a shot on 1.10; I just updated from rc to 1.10.0, so let me set up a new environment.

MIOpen is not available on Windows. Not sure if/when it’s coming.

Update: I made a new clean environment (only AMDGPU.jl v0.8.3 added in Pkg) under Julia v1.10.0 and re-ran the tests. I went from 6 failing out of 11921 under v1.9.4 to 26 failing out of 12243 under v1.10.0 and still get a driver timeout prompt from AMD’s software.

Test Summary
Test Summary:                |  Pass  Fail  Error  Broken  Total      Time
AMDGPU                       | 12243    26     22     156  12447  14m26.2s
  test                       | 12243    26     22     156  12447
    test\core_tests.jl       |   604                    2    606
    test\device_tests.jl     |   459                   10    469
    test\external_tests.jl   |    18                          18
    test\gpuarrays_tests.jl  |  6978                        6978
    test\hip_core_tests.jl   |   460                         460
    test\hip_extra_tests.jl  |  1575    26     22     136   1759
      hip - extra            |  1575    26     22     136   1759   5m11.8s
        rocSOLVER            |   406                         406     31.5s
        rocSPARSE            |  1099                  136   1235   1m38.2s
        FFT                  |    70    26     22            118   3m01.7s
          T = ComplexF32     |    32            3             35     16.5s
            1D               |                  1              1      2.1s
            1D inplace       |                  1              1      0.6s
            2D               |     3                           3      2.3s
            2D inplace       |     2                           2      1.0s
            Batch 1D         |     6                           6      0.8s
            3D               |     3                           3      2.3s
            3D inplace       |     2                           2      1.1s
            Batch 2D (in 3D) |     7                           7      2.6s
            Batch 2D (in 4D) |     9                           9      3.2s
            FFT Wrappers     |                  1              1      0.5s
          T = ComplexF64     |    32            3             35     11.0s
            1D               |                  1              1      0.6s
            1D inplace       |                  1              1      0.5s
            2D               |     3                           3      1.2s
            2D inplace       |     2                           2      1.0s
            Batch 1D         |     6                           6      0.6s
            3D               |     3                           3      1.2s
            3D inplace       |     2                           2      1.0s
            Batch 2D (in 3D) |     7                           7      2.0s
            Batch 2D (in 4D) |     9                           9      2.6s
            FFT Wrappers     |                  1              1      0.1s
          T = Float32        |     3    12      6             21   2m22.2s
            1D               |                  1              1      0.7s
            2D               |                  1              1      0.3s
            Batch 1D         |           4      1              5   2m16.0s
            3D               |           1      1              2      1.0s
            Batch 2D (in 3D) |           1      1              2      0.7s
            Batch 2D (in 4D) |     3     6                     9      3.4s
            FFT Wrappers     |                  1              1      0.1s
          T = Float64        |     3    12      6             21      7.6s
            1D               |                  1              1      0.7s
            2D               |                  1              1      0.3s
            Batch 1D         |           4      1              5      1.7s
            3D               |           1      1              2      0.8s
            Batch 2D (in 3D) |           1      1              2      0.6s
            Batch 2D (in 4D) |     3     6                     9      3.3s
            FFT Wrappers     |                  1              1      0.1s
          FFT with view      |           2                     2      0.3s
          Promoted types     |                  4              4      3.6s
            T = Float32      |                  1              1      1.3s
            T = Float64      |                  1              1      1.0s
            Complex{Int}     |                  1              1      0.4s
            Int              |                  1              1      0.9s
    test\ka_tests.jl         |  2149                    8   2157
ERROR: LoadError: Some tests did not pass: 12243 passed, 26 failed, 22 errored, 156 broken.
in expression starting at C:\Users\mikei\.julia\packages\AMDGPU\FdIJi\test\runtests.jl:105

rocFFT integration is also broken on Windows.
I accidentally removed the windows check when running tests, so you can ignore it for the moment

1 Like

Ok, so it sounds like the tests that have failed were to-be-expected for this configuration?

I’m generally willing to be flexible about what features are available with this card, I’ve mostly been nervous about the potential for silently-wrong results.

Ok, so it sounds like the tests that have failed were to-be-expected for this configuration?

Yes, we just need to update rocFFT to use the latest API probably.

And generally, unless there are some undiscovered bugs, it should not give siltently wrong results (provided you are using things correctly). I’ve been using AMDGPU.jl for lots of things without these kinds of issues.

1 Like