Performance bug when creating a sparse matrix via Python

Hi all,

I am trying to solve Ax=B for a sparse A in Julia. I create the A and B matrix in Python and import it into Julia via the PyCall package.The problem is that this import takes ages and consumes all my RAM. I believe the issue lies mostly with PyCall so I have created the following issue there:

JuliaPy/PyCall.jl#735

However, since I’m not getting any answers from PyCall I’m turning to Julia itself. I hope this is no problem.

Any help would be greatly appreciated!

Kind regards,
Tom

Ok so I dumped the matrix to a .mtx file and I then read it using the mmread() function of the MatrixMarket package. In the last line, it converts the read intries into a SparseMatrixCSC object. When I run via Python, this again takes up enormous amounts of RAM. HOWEVER, when running this natively in Julia via

using MatrixMarket
A = MatrixMarket.mmread(“somepathhere/M5.mtx”)

there are no performance issues at all! I thus suspect that there is a problem with how my Julia is configured, although I have no idea where to start looking…

Just a guess, but does j.eval() try to print the result? Is it possible that the code is printing out a giant sparse matrix and blowing up?

What if you put a trailing semicolon in the string that you j.eval() ? Or what if you do something like: j.eval("A = transformToJulia(...); A[1, 1]") ?

As I explained in the issue you posted (https://github.com/JuliaPy/PyCall.jl/issues/735), automatic conversion of both arguments and return values is performed by PyCall when you call Julia from Python or vice versa. You have to turn this off to avoid accidentally trying to convert your returned Julia sparse array into a dense Python array.

This function will do what you want:

using PyCall, SparseArrays

function scipyCSC_to_julia(A)
    m, n = A.shape
    colPtr = Int[i+1 for i in PyArray(A."indptr")]
    rowVal = Int[i+1 for i in PyArray(A."indices")]
    nzVal = Vector{Float64}(PyArray(A."data"))
    B = SparseMatrixCSC{Float64,Int}(m, n, colPtr, rowVal, nzVal)
    return PyCall.pyjlwrap_new(B)
end

Just call this directly from Python with a scipy CSC array A (no need to import your Python variables as global Julia variables first!).

Alternatively, you can just return B from scipyCSC_to_julia, to make it return a native Julia CSC array within Julia with no Python wrapper. Then, in Python, call scipyCSC_to_julia = j.pyfunctionret(j.Main.scipyCSC_to_julia, j.Any, j.PyObject) to create a Python-callable wrapper that performs no argument conversion.

The upshot is that you have complete control of when and how arguments are converted, but you have to exert it in cases where the default conversions are not what you want.

2 Likes