Row / column major with PythonCall , OpenCV

row / column major with PythonCall and numpy + OpenCV

I am trying to create an image processing program using PythonCall.jl and the Python version of OpenCV.

The specification I have in mind now is to create the pixel array as a 2-dimensional array img[y,x] with row major so that the arrangement is natural on the OpenCV side. As a result, on the Julia side, I will treat it as a transposed image like img[x,y].

Python(OpenCV) → Julia

On the Julia side, I’m able to receive ndarray as I expected. However, when I call pyconvert() to convert this to a Julia Array, “varies the fastest” axis is converted to y-axis by copying. I don’t care if it results in a
vertically long image, I want “varies the fastest” axis to remain the x-axis.

using BenchmarkTools
using PythonCall
using Images

np = pyimport("numpy")
cv2 = pyimport("cv2")

img_j = load("logo.png")
h, w = size(img_j)
(186, 288)
from_py = cv2.imread("logo.png", cv2.IMREAD_GRAYSCALE)

from_py.flags, from_py.shape, from_py.strides
(<py   C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
>, <py (186, 288)>, <py (288, 1)>)
from_py_to_jl = @btime pyconvert(Array, from_py)
typeof(from_py_to_jl), size(from_py_to_jl), strides(from_py_to_jl)
  172.619 μs (26 allocations: 53.47 KiB)
(Matrix{UInt8}, (186, 288), (1, 186))

Julia → Python(OpenCV)

It seems that I need to explicitly Py(x).to_numpy() to pass the Julia array to the OpenCV function. This will create a ndarray that is F_CONTIGUOUS without copying. This is a natural specification with no problem, but I want to create a ndarray that is C_CONTIGUOUS without copying. This will be correctly recognized by OpenCV as a horizontal image.

from_jl = zeros(UInt8,w,h)

typeof(from_jl), size(from_jl), strides(from_jl)
(Matrix{UInt8}, (288, 186), (1, 288))
from_jl_to_py = @btime Py(from_jl).to_numpy()

from_jl_to_py.flags, from_jl_to_py.shape, from_jl_to_py.strides
  7.276 μs (19 allocations: 544 bytes)
(<py   C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
>, <py (288, 186)>, <py (1, 288)>)

To summarize my question.

  1. How can I convert the ndarray received from the Python side into a Julia Array without copying and changing the “varies the fastest” axis?

  2. How can I create a C_CONTIGUOUS ndarray with Py(x).to_numpy()?

Please excuse my poor English.

The word “varies the fastest” is borrowed from Glossary — NumPy v1.13 Manual .

Try PermutedDimsArray

https://docs.julialang.org/en/v1/base/arrays/#Base.PermutedDimsArrays.PermutedDimsArray

julia> M = reshape(1:16, 4, 4) |> collect                     
4×4 Matrix{Int64}:                                             
 1  5   9  13
 2  6  10  14
 3  7  11  15
 4  8  12  16

julia> pda = PermutedDimsArray(M, (2,1))
4×4 PermutedDimsArray(::Matrix{Int64}, (2, 1)) with eltype Int64:
1   2   3   4                                 
5   6   7   8                                 
9  10  11  12                                
13  14  15  16    
                                                                        
julia> pda[1] = 17                            
17  
                                                                                        
julia> M                                      
4×4 Matrix{Int64}:                             
17  5   9  13                                  
2  6  10  14                                  
3  7  11  15                                  
4  8  12  16
2 Likes

Hi mkitti,

Thanks for the advice. I have tried about converting from Julia to Python.

from_jl_p =  PermutedDimsArray(from_jl, (2,1))
typeof(from_jl_p), size(from_jl_p), strides(from_jl_p)
(PermutedDimsArray{UInt8, 2, (2, 1), (2, 1), Matrix{UInt8}}, (186, 288), (288, 1))
from_jl_p_to_py = @btime Py(from_jl_p).to_numpy()
from_jl_p_to_py.flags, from_jl_p_to_py.shape, from_jl_p_to_py.strides
  5.900 μs (20 allocations: 560 bytes)
(<py   C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
>, <py (186, 288)>, <py (288, 1)>)

PermutedDimsArray gave me the expected array! I am not sure if this method is the most efficient, but it seems to be the way to go for now.

On the other hand, I tried to convert from Python to Julia by pyconvert() after transpose ndarray in advance because it was inspired by PermutedDimsArray method.

from_py_t = from_py.T
from_py_t.flags, from_py_t.shape, from_py_t.strides
(<py   C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
>, <py (288, 186)>, <py (1, 288)>)
from_py_t_to_jl = @btime pyconvert(Array, from_py_t)
typeof(from_py_t_to_jl), size(from_py_t_to_jl), strides(from_py_t_to_jl)
  30.303 μs (26 allocations: 53.47 KiB)
(Matrix{UInt8}, (288, 186), (1, 288))

The resulting Julia Array was as expected, but unfortunately the copying still occurred as before.

Transposing Python-side is good too.

You can do pyconvert(AbstractArray, ...) or pyconvert(PyArray, ...) to get a non-copied array.

Do you really need to have the array in column-major in the Julia side? I have the same issue in GMT and what I do when the images are read with GDAL is to keep them row-major and have an extra field in the type that describes the memory layout. It becomes confusing sometimes but like this I avoid copies.

In my experiments pyconvert() always returns a copy, how can I get a non-copied array?

I tried pyconvert(PyArray, ...) and it returned a copy as well.

joa-quim,

You’re absolutely right. To be precise, columun major or row major is not the essence, when dealing with images, I want the “values the fastest” axis to be horizontal to the image. This has been the natural direction for raster scans since the days of analog TV, and some image processing algorithms assume horizontal scans for memory or cache efficiency.

In other words, the problem I have now is how to pass the raster scan in both directions between Julia,Python without copying the array while keeping the raster scan.

A while ago, I made a package to help with no copy operations between PyCall.jl and Numpy:

Can you show exactly what you did to determine this because converting a numpy array to a PyArray definitely doesn’t copy?

Oh great.

The readme says the following, is there anything in the source code that might be helpful?

Have you heard of PythonCall.jl?
Yes. PythonCall.jl is another implementation of a Julia language interface to the Python language. My understanding is that PythonCall.jl already includes this functionality.

You are right, I was mistaken. The allocated memory was quite large, so I thought it were being copied.


size(from_jl)


(288, 186)


@btime pyconvert(PyArray, from_jl)


1.323 ms (167 allocations: 8.27 KiB)

However no copies have been made, but with this allocate size and processing time I cannot accept it.

Indeed those timings/allocations are not expected. Can you post code to reproduce this please?

It is exactly the same as has been presented above, but once again, the minimum code is presented.

using BenchmarkTools
using PythonCall
w = 288; h = 186;
@btime pyconvert(PyArray, zeros(UInt8,w,h))
  234.790 μs (169 allocations: 60.69 KiB)

I have tried running the code several times in the process of organizing it, but neither the execution time nor the allocation size is stable. Allocation size is always large.

The results are almost the same for both REPL and Jupyter.

julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 16 × Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 6 on 16 virtual cores
Environment:
  JULIA_NUM_THREADS = 6

from Numpy side you can .reshape(order='F') into Fortran/Julia column order?

https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html

1 Like

Is this what you are referring to?

a = np.zeros((h,w), np.uint8).reshape((w,h),order='F')
@btime pyconvert(Array, a)
  37.695 μs (26 allocations: 53.47 KiB)

It is the same as the method we already tried before, transpose ndarray and then convert.

So far I have not found a way to pyconvert() an ndarray to a julia array without copying.

if you convert to an Array, Julia must copy because Julia’s Array is always col-major.

What I’m saying is both Numpy and Julia reshape() are non-copy, but you have to make run-time sacrifice so it’s your call when or if to do copying.

I don’t think copying is required to convert an Array between Julia and Python; the conversion between row major and column major is essentially a transpose, i.e., it is an exchange of indices. Therefore, it is possible to exchange row major and column major by sharing a single memory area.

The remaining problem is not the conversion of row major and column major, but the unintentional change of “varies the fastest" axis, which generally requires a copy. Also, the copy is occurring even though “varies the fastest" axis is preserved by pyconvert().

First note that this is timing creating a Julia array, wrapping it as a Python object, and then converting that to a PyArray. It’s also recommended to use $ to interpolate non-const values into @btime otherwise you are also timing the lookup cost.

Just timing the pyconvert call itself on a numpy array it takes 3.6μs for me:

julia> using BenchmarkTools, PythonCall

julia> w = 288; h = 186;

julia> x = pyimport("numpy").zeros((h, w), dtype="uint8");

julia> @btime pyconvert(PyArray, $x);
  3.638 μs (22 allocations: 1.00 KiB)

Note also that PyArray is parametric, and if you specify a concrete type then it will go a lot faster because it can avoid a lot of dynamic type specification - 0.32μs for me:

julia> typeof(PyArray(x))
PyArray{UInt8, 2, true, false, UInt8}

julia> @btime pyconvert(PyArray{UInt8,2,true,false,UInt8}, $x);
  316.889 ns (3 allocations: 112 bytes)

You can also go a little faster if you avoid pyconvert and construct the PyArray directly - 0.27μs for me:

julia> @btime PyArray{UInt8,2,true,false,UInt8}($x);
  273.852 ns (3 allocations: 112 bytes)
1 Like

I was so focused on keeping the code simple that I was making stupid mistakes.
The interpolation and PyArray construction techniques were very helpful. Many thanks.