I am trying to return a numpy array to Julia without copying.
It took me some time to understand that pycall(py"myPythonFunctionReturningaNumPyArray", PyArray, filename)
seems to be the only way to prevent the system from automatic copy-conversion with index reversion. The description in the Readme was a hard for me to understand at first. Maybe adding an example call as above would help here.
However, even then I have a Julia object, which is a wrapped array wich “fakes” the numpy indexing. However, I want a plain Julia array with plain Julia indexing. Meaning that the numpy shape of (10,20,30) should consistently be (30,20,10) simply using the same raw data pointer. Is there a better way then somehow constructing a new array from the accessible pointer?
PythonCall.jl makes the opposite choice of never converting unless you explicitly request it.
I’m not sure what you mean by “plain” here?
Do you want an Array object, as opposed to PyArray (which is another subtype of AbstractArray)? If you specifically need an Array for some reason (why?) your only choices are (1) make a copy or (2) use unsafe_wrap to wrap an Array around the raw pointer (the data field of the PyArray) in which case you need to be very careful.
do the job. But then img is managed by the Julia garbage collector, which leads to SegFaults if you try to package this code up in a function. img and the associated python memore will be released. Maybe one should have a datatype argument JArray that does the right thing?
To be honest, I do not understant why it is useful to use the pyhon-based indexing within Julia or vice-version, given that they are different in several aspects anyway.
Why do you need an Array specifically, rather than another array type?
If you need an Array specifically, then you have to keep the Python object alive for as long as you need the array (this one of the “unsafe” parts of unsafe_wrap). One way to do this would be to wrap the two together into a new subtype:
struct WrappedArray{T,N} <: AbstractArray{T,N}
a::Array{T,N}
o::PyObject
end
that keeps a reference to the Python object, and then delegate all of the array methods (e.g. indexing) to a. But then you still don’t have an Array, so I’m not sure in what circumstance the above would work but not a PyArray.
If it’s just the ordering of the dimensions that you’re objecting to (because yes indeed it is usually natural to have dimensions in the reverse order between Julia and Numpy because of column- versus row-major memory layout) then you can always transpose the numpy array before converting to a PyArray. This will still be non-copying and fast.
Thanks. I guess this is the best way forward. The question then is how much extra boilerplate code will be needed so that this array supports all kind of operations.
But maybe I should have a look at the code in PyCall to see how the conversion to PyArray was done. Maybe adding another sich type or even just a base array is less work.
My pint is that the original data in Python is already in the correct order (Z,Y,X) and in Julia it should thus be (X,Y,Z) since this is how this language works. First permitting the dimensions to the. Have it “interpreted” via get_index in an I usual way (which I guess is what PyArray does) might work, but sounds totally wrong and slow to me.
The best would be to prevent the garbage collector to collect this memory (GC.@prevent?) and let it be managed by unsave_wrap().
What kinds of operations do you need? PyArray implements the AbstractArray interface, so most array-based functions should work. (Some of the LinearAlgebra functions maybe be slower because the LAPACK interface expects Array.)
Not possible, an Array is not as complicated as a NumPy array, which can share buffers with other arrays and traverse in various strides and orders; that’s how NumPy array views are just NumPy arrays, while a view of an Array needs another wrapper type SubArray. You’ll need wrapper types to avoid copying to or from Arrays. On the Julia side, PermutedDimsArray can swap around the axes, OffsetArrays.jl can change the indices per axis. That said, it’s unusual to change the indexing across the language barrier when working on the same array.
Doesn’t the bulk of it still expect 1-based indexing too?
Yep, so if you transpose the array first (which is a non-copying operation in numpy which just reinterprets the same underlying data buffer) before wrapping as a PyArray then you get the dimension ordering you want.
i.e. if the original numpy array x is C-contiguous (which is the norm in numpy) then x.T is F-contiguous (which is the norm in Julia) so PyArray(x.T) is F-contiguous and can in theory support fast linear indexing for example (i.e. is not strided).
I’m not sure if PyCall.PyArray actually supports fast linear indexing but PythonCall.PyArray does.
Sorry, I must be missing something here. In my understanding the data is loaded in Python into a contiguous piece of memory and when I use unsafe_wrap in Julia with the Python pointer, retrieved from the PyArray object, which does agree to the one used in Python (I checked). This is all I want: the original data wrapped in Julia into a standard array with standard Julia orientation. So therefore I do not understand why you say “not possible”. I understand that this may not be generally possible for any type of odd python indexing scheme, but this is maybe also not always supported by the PyArray wrap and only by the default copy mechanism.
Of course I can live with copying, but all of this sounds pretty wrong for the use-cause I am interested in. And loading Gb of data can lead to unacceptable memory overhead in some cases. All of this should be unnecessary here.
Yes, I can try this and I understand that this may work, if PyArray is smart enough. But it would need a permute-dims operation in Python, since the data is not 2D but can be 3D, 4D etc.
Yes. Finally it works by applying np.transpose(img, range(img.ndim-1,-1,-1)) before returning the array. Then PyArray wraps it up correctly. And I checked, the pointer stays the same throughout the operations.
By the way: simply annotating the pycall with GC.@preserve and using own=true in the unsafe_wrap did not work and still led to memory faults. But his is now a useful way. So PyArray seems smart enough to support Fortran-style indexing. Maybe one could allow Array as an input to pycall and directly support this indexing and return an Array to make it simple?
Sharing the linear buffer (data attribute) is not a sufficient condition for moving a NumPy array in general to the Julia side, the strides and order are important. For example, basic slicing often makes a non-contiguous NumPy array (check the flags attribute). If you’re satisfied with F-contiguous NumPy arrays (or C-contiguous arrays with dimensions inverted in Julia), go for it.
I still don’t understand what your use-case is. In what case is it acceptable to have a PyArray where the underlying data is column-major, but not a PyArray where the data is row-major? What operation fails for the latter but not the former?
I guess it is not about failing operations. It is only about whether and how to expose a Python Array without copying to Julia.
Here is my dream world:
The user should be able to select how this is done in ‘pycall’. One could have a named argument ‘dims_order’, which by default could be ‘:order_python’ but you can also choose ‘:order_julia’ which would reverse the Python dimensions or ‘order_memory’ which would ignore all transpose modifications in Python and just return the Julia order based on the underlying memory layout.
The latter could then be paired the the plain Julia Array type as a wrapper, if the user chooses this instead of PyArray.
My use case was that there was a Python routine which can load a large file into an array and the order in memory is OK the way it is. I would have liked to simply have this available as a plain Julia Array without needing an extra copy. Why a plain array? Well for most cases it does not matter. But sometimes routines are limited to Array rather than abstract array. For example ‘select_region_view()’ in NDTools.jl is an example. Since Cuds is not (yet) supported, we limited the argument to Array.
PyArray does not copy the data. Wrapping a Numpy array as either a Array or a PyArray both just take a copy of the pointer to the data (but not the actual data). The only major differences between the two approaches are that PyArray properly preserves the dimensions and strides and the like, and…
Wrapping a Numpy array as a regular Array cannot be done safely. The only way to wrap as an Array is with unsafe_wrap(Array, ...). This is unsafe because it does not hold a reference to the original python object, which could therefore be deleted at any point, invalidating the memory that the Array points to.
Then that’s a limitation of that function. Make it more generic to accept any AbstractArray, or at least add a method for PyArray - there’s no reason this shouldn’t be possible.