Performance of `exp(A)` for 9x9 anti-Hermitian matrix: Julia vs. PyTorch vs. MATLAB (CPU & GPU)

As for julia, I think we really miss a lot of batched solvers, we have everything to write them ie, CUBLAS and CUSOLVER however I think they are not right now, it’s just tricky to get right but if you know the alg enough don’t hesitate to try implementing it ( for your case (non-hermitian) it is the Padé + scaling & squaring batched alg)