Assume that in Julia I have the following, which I assume should be OK because arrays in Julia are column major. x = Array{Float32, 3}(500, 20, 200) # Initialize x for c = 1:200 for b = 1:20 for a = 1:500 x[a, b, c] = x[a, b, c] + 1 end end end Now, if I want to…

Is the above code optimal Well, you can just benchmark it, right? Given this slightly modified example which allows me to quickly shift indexes around (with the 1024 thread limit of my GPU): function kernel(x) c = blockIdx().x b = blockIdx().y a = threadIdx().x x[a, b, c] = x[a,…

Optimizing the use of Blocks, Threads vs. Array Indexing

Specific Domains GPU

Raf September 21, 2018, 1:28pm 12

Sorry to necro-bump this thread but did anyone get anywhere with a GPU friendly sorting algorithm?

GPU Sort Function

Topic		Replies	Views
CUDA \| nested loops kernel GPU question	5	169	May 12, 2025
Row and column major order for arrays of different shape Performance column-major , row-major	23	2712	April 12, 2022
Simple CUDA kernel on matrix slower than running GPU GPU gpu , cuda , matrix	8	556	June 3, 2024
I32 indexing GPU	8	430	March 24, 2025
GPU sort WIP (GPU 1000x faster than CPU? I must be doing something wrong) GPU sort	12	2198	January 31, 2019

Optimizing the use of Blocks, Threads vs. Array Indexing

Related topics