Using real NCHW order when using cuDNN.jl

Jonas208 · June 26, 2023, 1:06pm

I am trying to use cuDNN.jl for GPU accelerated convolution.

I used the function cuDNN.cudnnConvolutionForward. The function has the keyword argument format that specifies the order of dimensions (format=cuDNN.CUDNN_TENSOR_NHWC or format=CUDNN_TENSOR_NCHW). However, the Julia dimensions have the opposite order.
However, my data is in real NCHW order (not in the opposite order).

I used permutedims as a work-around:

function conv_cudnn(x, w, b; stride=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1)
    x = CuArray(x)
    x = permutedims(x, (4, 3, 2, 1))

    w = CuArray(w)
    w = permutedims(w, (4, 3, 2, 1))

    b = CuArray(b)
    b = reshape(b, (1, 1, length(b), 1))

    y = CUDA.@time cuDNN.cudnnConvolutionForward(w, x, bias=b, padding=padding, stride=stride, dilation=dilation, group=groups, reorderType=cuDNN.CUDNN_DEFAULT_REORDER, mode=cuDNN.CUDNN_CROSS_CORRELATION)

    return permutedims(y, (4, 3, 2, 1))
end

My inputs look like this:

# define inputs (real NCHW order)
x = rand(32, 16, 64, 64)
w = rand(32, 8, 5, 5)
b = rand(32)

Is there a better or faster way to use cuDNN with “real” NCHW order (e.g. without using permutedims)?

Best regards and thank you in advance!

ToucheSir · June 26, 2023, 11:35pm

Note that NCHW order for a row-major API like cuDNN corresponds to the exact same memory layout as WHCN order for a column-major language like Julia. Thus I’m not exactly sure what constitutes “real” in this context. Perhaps you could provide some more background on why you need to have NCHW order data in Julia, this may be a XY problem.

Jonas208 · June 27, 2023, 10:59am

I’ve written some of the well known deep learning algorithms in Julia (convolution, (adaptive) pooling, etc.) - just because I wanted to see how deep learning works at “low-level”. Because I came from PyTorch to Julia, I kept the NCHW order. When I started with Julia, I didn’t know about the difference between row-major/column-major ordered. For testing purposes, I checked my implementations against PyTorch using PyCall. I wanted to avoid always reordering the arrays when swapping with PyTorch (using e.g. permutedims). Now, I wanted to accelerate my implementations using CUDA.jl and cuDNN.jl. But if at all possible, I wanted to avoid switching my whole system to “WHCN”.

ToucheSir · June 30, 2023, 8:38pm

Thanks for the context. You should be able to avoid switching your whole system over by use of wrappers and additional interop libraries. The general idea is as follows:

Store arrays in Julia as WHCN. cuDNN.jl (and NNlib, which I’d highly recommend as a nicer API to the former) will only accept that format
When passing said arrays to your algorithm functions, wrap them with a lazy PermutedDimsArray which preserves the data but flips the dimension. This may well speed up said functions because using row- instead of column-major data access in Julia is really slow (ref. many help threads on this forum)
When passing data via PyCall to numpy, use GitHub - mkitti/NumPyArrays.jl: Julia package to extend the conversion of Julia arrays to NumPy arrays
When passing data via PyCall to PyTorch, use GitHub - pabloferz/DLPack.jl: Julia interface for dlpack

Jonas208 · June 30, 2023, 9:39pm

Thank you for the lots of information. Using PermutedDimsArray will probably make things easier. Up until now I didn’t know of any elegant or built-in way to use a permuted array without copying the data.

Topic		Replies	Views
What is the fastest dimension ordering for convolutions? Machine Learning augmentation , cudnn	1	705	January 25, 2021
Tensor dimension order on convolution layer General Usage flux	3	781	December 15, 2019
Dealing with "C style" memory layout General Usage question , c	2	396	September 23, 2021
CUDNN in Julia GPU cuda	6	1487	February 25, 2025
Should `reshape` have an option for row-major order? Internals & Design proposal , column-major	28	10453	December 4, 2022

Using real NCHW order when using cuDNN.jl

Related topics