A stack of inverse matrices calculation when comparing to python

When using pyjulia to compute a stack of inverse matrices, I find it’s slower than directly using numpy.linalg.inv. But this shows using julia is faster than using python. This makes me confused.
Here is my julia script file named “jlinv.jl”:

using StaticArrays, Einsum
function jlinv(m)
    sizem_orginal = size(m);
    mre = zeros((sizem_orginal[2], sizem_orginal[3],sizem_orginal[1],));
    sizem = size(mre);
    
    @einsum mre[j, k, i] := m[i, j, k];
    sm = reinterpret(SMatrix{sizem[1], sizem[2], Float64, sizem[1]*sizem[2]}, reshape(mre, length(mre)));
    sim = @eval inv.($sm);
    sim = reshape(reinterpret(Float64, sim), (sizem[1], sizem[2], sizem[3]));
    
    simout = zeros(sizem_orginal);
    @einsum simout[k, i, j] = sim[i, j, k]
    return simout
end

And here’s my python function:


from julia.api import Julia
from julia import Main
import numpy as np
def getjlinv(asquere):
    jl = Julia(compiled_modules=False)
    jl.eval('include("jlinv.jl")')
    Main.jl_matrix = asquere
    asquereI = jl.eval("jlinv(jl_matrix)")
    return asquereI

And here’s the time costs. The variable “asquere” is a 500000×4×4 python numpy array.
image

Some of that might be compilation, what happens if you time getjlinv again immediately afterwards?

Also I think I read somewhere that the LA library shipped with Julia (OpenBLAS?) is not the fastest, but you can swap it out for another.

FIrst, it looks like you are including the time to initialize Julia and compile your function. (In real-life usage, you wouldn’t do this on every function call!)

(Also the @eval in your jlinv makes no sense, why are you invoking parse/eval there instead of simply calling inv.(sm)? Using @eval also forces the evaluation to happen in global scope, which slows things down.)

Also, constructing the SMatrix wrapper on the fly like that is not type-stable, so the compiler cannot type-specialize inv.(sm) properly even if you omit the @eval. In native Julia code, you would pass an array of SMatrix directly, in which case the compiler would know the matrix size (being encoded in the type).

Not to mention that you are including the time to make multiple copies/transposes of your array in Julia.

  1. When you call Julia from Python, by default it makes a copy of the row-major numpy original into a Julia Array. (You can avoid this by telling pyjulia to pass a PyArray wrapper.)
  2. You then transpose into mre
  3. You then make another copy into sim instead of inverting in-place
  4. You then transpose into simout

Basically, you are throwing away a lot of the advantages you would have in native Julia code.

(If you want to call Julia from Python, rather than wrapping one elementary operation, e.g. a vectorized inv, at a time, you will get much better gains if you translate a whole sequence of operations to Julia and call that all at once. That being said, it is possible to eliminate all of the extra copies/transposes above if you know what you are doing with translating NumPy data structures to Julia, but this requires a bit more low-level understanding.)

PS. You don’t need a semicolon (;) at the end of every line in Julia, though it is harmless.

4 Likes

I already ran getjlinv more than once before I posted cost-time screenshot. The reason might not be the compilation. Maybe I should use NumPy directly instead of pyjulia. :expressionless:

Thank you for your patient reply. I am a new julia user so some of my code might not by properly used. After dropping @eval and multiple copies/transposes in my julia script, I find it unexpectedly costs MORE time than the first version. That’s so weird.
Here’s my new julia script file named “jlinv_v2.jl”:

using StaticArrays
function jlinv_v2(m)
    sizem = size(m)
    sm = reinterpret(SMatrix{sizem[1], sizem[2], Float64, sizem[1]*sizem[2]}, reshape(m, length(m)))
    sim = inv.(sm)
    return sim
end

And here’s my new python function:

from julia.api import Julia
from julia import Main
jl.eval('include("jlinv.jl")')
jl.eval('include("jlinv_v2.jl")')

def getjlinv(asquere):
    Main.jl_matrix = asquere
    asquereI = jl.eval("jlinv(jl_matrix)")
    return asquereI

def getjlinv_v2(asquere_jl):
    Main.jl_matrix = asquere_jl
    asquereI_jl = jl.eval("jlinv_v2(jl_matrix)")
    return asquereI_jl

And here’s the time costs:

Quick question, what size is the matrix m here? It’s pretty big right?

I input the 500000×4×4 array asquere as m in the function jlinv of my julia script jlinv.jl, in order to compute the inverse of 500000 4×4 matrices.

Then you should make it 4x4x500000 instead. Julia Arrays are column major, so that’s really important.

Yeah I noticed that, so I used @einsum to transpose the array.