PyCall.jl with PermutedDimsArray, StridedSubArray, and ReinterperetArray

I have been working on Napari.jl which is a Julia package for Napari, a multidimensional image viewer written in Python developed at Chan-Zuckerberg Initiative’s Biohub with many contributors from the field of biological microscopy.

One challenge has been decoding Julia’s image framework designed by @tim.holy into primitive arrays that PyCall can turn into NumPy arrays without copying the underlying data. I had figured out a few enhancements to PyCall that would expand the set of arrays that can be transmitted to NumPy without copying. In particular, I believe that many PermutedDimArrays, StridedSubArrays, and ReinterpretArrays could become NumPy arrays without copying. The latter two are generated by the methods view and reinterpret, respectively.

I’m posting to see if there are other Julia array structures that could be transferred to NumPy more efficiently

For example, if you load mandrill from TestImages.jl and then try to directly convert this into a PyObject you will get list rather than a NumPy array.

julia> using TestImages

julia> mandrill = TestImages.testimage("mandrill");

julia> typeof(mandrill)
Array{ColorTypes.RGB{FixedPointNumbers.Normed{UInt8,8}},2}

julia> using PyCall
[ Info: Precompiling PyCall [438e738f-606a-5dbb-bf0a-cddfbfd45ab0]

julia> py_mandrill = PyObject(mandrill);

julia> pytypeof(py_mandrill)
PyObject <class 'list'>

To get a NumPy array that is just a view of the original Julia array data, you can do a few manipulations to get a copy of an array that NumPy and Napari can easily understand:

julia> using Images

julia> mandrill_cv = channelview(mandrill);

julia> typeof(mandrill_cv)
Base.ReinterpretArray{Normed{UInt8,8},3,RGB{Normed{UInt8,8}},Array{RGB{Normed{UInt8,8}},3}}

julia> mandrill_cv_uint8 = reinterpret(UInt8, mandrill_cv);

julia> typeof(mandrill_cv_uint8)
Base.ReinterpretArray{UInt8,3,RGB{Normed{UInt8,8}},Array{RGB{Normed{UInt8,8}},3}}

julia> size(mandrill_cv_uint8)
(3, 512, 512)

julia> mandrill_cv_uint8_permuted = PermutedDimsArray(mandrill_cv_uint8, [2,3,1]);

julia> size( mandrill_cv_uint8_permuted )
(512, 512, 3)

julia> mandrill_cv_uint8_permuted_copied = copy(mandrill_cv_uint8_permuted);

julia> typeof(mandrill_cv_uint8_permuted_copied)
Array{UInt8,3}

julia> using PyCall

julia> py_mandrill_cv_uint8_permuted_copied = PyObject(mandrill_cv_uint8_permuted_copied);

julia> pytypeof(py_mandrill_cv_uint8_permuted_copied)
PyObject <class 'numpy.ndarray'>

julia> py_mandrill_cv_uint8_permuted = PyObject(mandrill_cv_uint8_permuted);

julia> pytypeof(py_mandrill_cv_uint8_permuted)
PyObject <class 'list'>

My objective is to get the same result without any copying. After some modifications to PyCall.jl this is now possible.

julia> using TestImages, Images, PyCall

julia> mandrill = TestImages.testimage("mandrill");

julia> mandrill_cv_uint8_permuted = PermutedDimsArray( reinterpret(UInt8, channelview(mandrill) ), (2,3,1) );

julia> py_mandrill_cv_uint8_permuted = PyObject( mandrill_cv_uint8_permuted );

julia> pytypeof( py_mandrill_cv_uint8_permuted )
PyObject <class 'numpy.ndarray'>

julia> print(mandrill[1])
RGB{N0f8}(0.643,0.588,0.278)

julia> mandrill[1] = 0
0

julia> py_mandrill_cv_uint8_permuted.__getitem__( (0,0,0) )
0

julia> py_mandrill_cv_uint8_permuted.__setitem__( (0,0,0) , 255)

julia> print(mandrill[1])
RGB{N0f8}(1.0,0.0,0.0)

Since no copying is involved, manipulating the array in Julia results in changes in Python while changes in Python result in changes seen by Julia.

The general strategy for this improvement is to focus on transferring the original data to NumPy and then recapitulating the transformations done in Julia in NumPy.

Are there other Julia array types that would benefit from enhanced no copy transfers using PyCall?

2 Likes

Maybe ReshapedArray? On Julia < 1.6, generally in the colorchannel world you need both reshape & reinterpret, but in Julia 1.6 we have the option to do them as a single entity. That said, there still might be places where reshape is useful.

One issue, though is that ReshapedArray often shows up only for non-strided arrays, which I think numpy can’t handle.

1 Like

Handling the Base.StridedReshapedArray case should be straightforward. For non-strided arrays, it will ultimately depend on what the parent array type is. I’m trying to think if there are any special cases that could be handled better than just copying the data.

If the parent is a UnitRange, such as via reshape( 1:100_000_000, 100, 1_000_000) , we could emulate the behavior using a dask.array.arange. That might be out of scope for PyCall.jl though.

Per @stevengj 's comments on my PR, it looks like we should able to widen the no-copy conversion of Julia arrays to NumPy arrays from StridedArray to any AbstractArray that has stride defined.