Help with AutoDiff in Metal.jl

I am trying to use Metal.jl for a scientific computation and having difficulty with AutoDiff (ForwardDiff). I am using julia 1.9.

This is the code

dF = x->ForwardDiff.gradient(x->dot(x,x),x)
N = 7;
x = rand(Float32,N); 
dF(MtlArray(x))

The code work for N=7, but not for N=8 and higher.

I get the following warning/error

Warning: Compilation of MetalLib to native code failed.
│ If you think this is a bug, please file an issue and attach /var/folders/l_/mmy119_j47b_k4mtcn0ngfj00000gn/T/jl_43KX6d0bSl.metallib.
└ @ Metal ~/.julia/packages/Metal/TtPHW/src/compiler/compilation.jl:77
ERROR: NSError: Threadgroup memory size (36864) exceeds the maximum threadgroup memory allowed (32768) (AGXMetal13_3, code 3)

What am I doing wrong here?
Any help in this regard is much appreciated.
Thanks!

That’s strange; your code works here. Feel free to open an issue on Metal.jl, with more details (Manifest, including the metallib file, full backtrace, etc).

Does it work when you set N = 8?

Oh hah, I blindly copied your code. No, it doesn’t, let me have a quick look.

Detect mapreduce threadgroup limits instead of guessing. by maleadt · Pull Request #176 · JuliaGPU/Metal.jl · GitHub should fix this

I should just upgrade Metal.jl at my end?
Thanks

@maleadt @raktim
In the similar spirit, with

mpu(arr)= MtlArray(Float32.(arr));
N = 7;
x = rand(Float32,N) |>xpu; 

this works fine:

dF = x->gradient(()->sum(broadcast(exp,x)), Flux.params(x))
dF(x)

returns Grads(...), but using a cos instead of exp, not so:

dF = x->gradient(()->sum(broadcast(cos,x)), Flux.params(x))
dF(x)

returns

InvalidIRError: compiling kernel #broadcast_kernel#28(Metal.mtlKernelContext, MtlDeviceVector{ForwardDiff.Dual{Nothing, Float32, 1}, 1}, Base.Broadcast.Broadcasted{Metal.MtlArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Zygote.var"#1392#1393"{typeof(cos)}, Tuple{Base.Broadcast.Extruded{MtlDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to gpu_malloc)

That’s Support for exceptions · Issue #69 · JuliaGPU/Metal.jl · GitHub. You need to avoid exceptions in kernel code, for now.

1 Like