SIMD.jl - vload() continuous blocks from higher-dimensional arrays?

Hi,

I am currently doing some practice projects in Julia, which involves using SIMD.jl for writing/generating microkernels. My issue is trying to determine what is the “best” way of loading Vec types from a higher-dimensional (currently 2D) array, where the loads are continuous (no need for gather ops).

In 1D, the following works:

using SIMD

arr = Vector{Float64}(undef, 100)

xs = vload(Vec{4,Float64}, arr, i)

In 2D however, linear indexing fails (lets say I want to load the elements of x[1:4, 1], i.e. those of linear index 1:4:

x = zeros(100, 100)

xs = vload(Vec{4,Float64}, x, 1)

(MethodError)

A workaround is using raw pointers:

x = zeros(100, 100)
xs = vload(Vec{4,Float64}, pointer(x, 1))

Which does work, but relies on raw pointers, and thus makes debugging/boundscheck a pain (and the code starts to look an awfully C-like…).

My question is, is there a better/safer/more idiomatic way of achieving this, or should I instead rewrite the kernels to operate on explicitly 1-D arrays (thus losing some abstraction)?

If yes, are there any examples of code where SIMD.jl is used in conjunction with explicitly higher-dimensional, but continuous loads?

Thank you for your help in advance.

I cannot test now, but can’t you just work on vec(x), that turns x into a vector while reusing the underlying data?

BTW, I have only ever used the indexing interface in SIMD.jl, and find that so much nicer-looking. But almost everyone seems to be using vload/vstore. What is the advantage? It just looks needlessly more verbose to me.

1 Like

Just checked, vec() works with loading from 2D arrays, however, it fails if its a view of an underlying 2D array (which I will likely also need down the line, however it makes sense that it fails since the view is not guaranteed to be continuous…).

using SIMD

x = zeros(128, 128)
u = @views x[:, 1:10] #continious-in-memory view


xu = vload(Vec{4,Float64}, pointer(x, 1)) #works
xy = vload(Vec{4,Float64}, vec(x), 1) #works

xp = vload(Vec{4,Float64}, pointer(u, 1)) #works
xv = vload(Vec{4,Float64}, vec(u), 1) #fails!

As of why everyone is using vload/vestore, no idea, I am only just beginning to properly dig into low-level SIMD control.

I don’t know about all this stuff with vload, but I just use the indexing interface, avoiding pointers, like this:

julia> x = collect(reshape(1:24, 6, :));

julia> u = @views x[:, 1:3]
6×3 view(::Matrix{Int64}, :, 1:3) with eltype Int64:
 1   7  13
 2   8  14
 3   9  15
 4  10  16
 5  11  17
 6  12  18

julia> lane = VecRange{4}(1);

julia> u[lane]
<4 x Int64>[1, 2, 3, 4]

julia> u[lane+1]
<4 x Int64>[2, 3, 4, 5]

julia> u[lane+3]
<4 x Int64>[4, 5, 6, 7]

julia> u[lane+1, 2]  # also use it with 2D indexing
<4 x Int64>[8, 9, 10, 11]

Edit: made a more instructive example.

1 Like

I believe this works for my current use case! (generated assembly looks good for my current test cases), thank you!