I have been working on Napari.jl which is a Julia package for Napari, a multidimensional image viewer written in Python developed at Chan-Zuckerberg Initiative’s Biohub with many contributors from the field of biological microscopy.
One challenge has been decoding Julia’s image framework designed by @tim.holy into primitive arrays that PyCall can turn into NumPy arrays without copying the underlying data. I had figured out a few enhancements to PyCall that would expand the set of arrays that can be transmitted to NumPy without copying. In particular, I believe that many PermutedDimArray
s, StridedSubArray
s, and ReinterpretArray
s could become NumPy arrays without copying. The latter two are generated by the methods view
and reinterpret
, respectively.
I’m posting to see if there are other Julia array structures that could be transferred to NumPy more efficiently
For example, if you load mandrill
from TestImages.jl
and then try to directly convert this into a PyObject
you will get list
rather than a NumPy array.
julia> using TestImages
julia> mandrill = TestImages.testimage("mandrill");
julia> typeof(mandrill)
Array{ColorTypes.RGB{FixedPointNumbers.Normed{UInt8,8}},2}
julia> using PyCall
[ Info: Precompiling PyCall [438e738f-606a-5dbb-bf0a-cddfbfd45ab0]
julia> py_mandrill = PyObject(mandrill);
julia> pytypeof(py_mandrill)
PyObject <class 'list'>
To get a NumPy array that is just a view of the original Julia array data, you can do a few manipulations to get a copy of an array that NumPy and Napari can easily understand:
julia> using Images
julia> mandrill_cv = channelview(mandrill);
julia> typeof(mandrill_cv)
Base.ReinterpretArray{Normed{UInt8,8},3,RGB{Normed{UInt8,8}},Array{RGB{Normed{UInt8,8}},3}}
julia> mandrill_cv_uint8 = reinterpret(UInt8, mandrill_cv);
julia> typeof(mandrill_cv_uint8)
Base.ReinterpretArray{UInt8,3,RGB{Normed{UInt8,8}},Array{RGB{Normed{UInt8,8}},3}}
julia> size(mandrill_cv_uint8)
(3, 512, 512)
julia> mandrill_cv_uint8_permuted = PermutedDimsArray(mandrill_cv_uint8, [2,3,1]);
julia> size( mandrill_cv_uint8_permuted )
(512, 512, 3)
julia> mandrill_cv_uint8_permuted_copied = copy(mandrill_cv_uint8_permuted);
julia> typeof(mandrill_cv_uint8_permuted_copied)
Array{UInt8,3}
julia> using PyCall
julia> py_mandrill_cv_uint8_permuted_copied = PyObject(mandrill_cv_uint8_permuted_copied);
julia> pytypeof(py_mandrill_cv_uint8_permuted_copied)
PyObject <class 'numpy.ndarray'>
julia> py_mandrill_cv_uint8_permuted = PyObject(mandrill_cv_uint8_permuted);
julia> pytypeof(py_mandrill_cv_uint8_permuted)
PyObject <class 'list'>
My objective is to get the same result without any copying. After some modifications to PyCall.jl this is now possible.
julia> using TestImages, Images, PyCall
julia> mandrill = TestImages.testimage("mandrill");
julia> mandrill_cv_uint8_permuted = PermutedDimsArray( reinterpret(UInt8, channelview(mandrill) ), (2,3,1) );
julia> py_mandrill_cv_uint8_permuted = PyObject( mandrill_cv_uint8_permuted );
julia> pytypeof( py_mandrill_cv_uint8_permuted )
PyObject <class 'numpy.ndarray'>
julia> print(mandrill[1])
RGB{N0f8}(0.643,0.588,0.278)
julia> mandrill[1] = 0
0
julia> py_mandrill_cv_uint8_permuted.__getitem__( (0,0,0) )
0
julia> py_mandrill_cv_uint8_permuted.__setitem__( (0,0,0) , 255)
julia> print(mandrill[1])
RGB{N0f8}(1.0,0.0,0.0)
Since no copying is involved, manipulating the array in Julia results in changes in Python while changes in Python result in changes seen by Julia.
The general strategy for this improvement is to focus on transferring the original data to NumPy and then recapitulating the transformations done in Julia in NumPy.
Are there other Julia array types that would benefit from enhanced no copy transfers using PyCall?