Hello,
I was motivated by the recent blog post about the new threading infrastructure in Julia 1.3-alpha to try writing a Julia stencil code that I would normally end up writing in C.
However, before getting a chance to work on threading, I ran into a loop vectorization issue. In particular, whether or not the loop vectorizes seems to depend on the size of the stencil.
To illustrate, I made two methods: 1) does_vectorize!(w,u)
and 2) does_not_vectorize!(w,u)
. In both methods, we apply a stencil to u
, saving the result in w
. The only difference is that in the does_vectorize!
case there are 18 points in the stencil while in the does_not_vectorize!
case there are 19 points in the stencil.
Here is the code run with julia -O3
on both Julia 1.1.1 and Julia 1.3.0-alpha:
function does_vectorize!(w::Array{Float32,3},u::Array{Float32,3})
nz,ny,nx = size(u)
s = ntuple(i->rand(Float32),9)
@inbounds begin
for ix = 5:(nx-5), iy = 5:(ny-5)
@simd for iz = 5:(nz-5)
w[iz,iy,ix] =
s[1]*u[iz-4,iy, ix ] + s[2]*u[iz-3,iy, ix ] + s[3]*u[iz-2,iy, ix ] + s[4]*u[iz-1,iy, ix ] + s[5]*u[iz,iy,ix] + s[6]*u[iz+1,iy, ix ] + s[7]*u[iz+2,iy, ix ] + s[8]*u[iz+3,iy,ix ] + s[9]*u[iz+4,iy, ix ] +
s[1]*u[iz, iy-4,ix ] + s[2]*u[iz, iy-3,ix ] + s[3]*u[iz, iy-2,ix ] + s[4]*u[iz, iy-1,ix ] + s[5]*u[iz,iy,ix] + s[6]*u[iz, iy+1,ix ] + s[7]*u[iz, iy+2,ix ] + s[8]*u[iz, iy+3,ix ]
end
end
end
nothing
end
io = open("does_vectorize.txt", "w")
code_llvm(io, does_vectorize!, (Array{Float32,3}, Array{Float32,3}))
close(io)
function does_not_vectorize!(w::Array{Float32,3},u::Array{Float32,3})
nz,ny,nx = size(u)
s = ntuple(i->rand(Float32),9)
@inbounds begin
for ix = 5:(nx-5), iy = 5:(ny-5)
@simd for iz = 5:(nz-5)
w[iz,iy,ix] =
s[1]*u[iz-4,iy, ix ] + s[2]*u[iz-3,iy, ix ] + s[3]*u[iz-2,iy, ix ] + s[4]*u[iz-1,iy, ix ] + s[5]*u[iz,iy,ix] + s[6]*u[iz+1,iy, ix ] + s[7]*u[iz+2,iy, ix ] + s[8]*u[iz+3,iy,ix ] + s[9]*u[iz+4,iy, ix ] +
s[1]*u[iz, iy-4,ix ] + s[2]*u[iz, iy-3,ix ] + s[3]*u[iz, iy-2,ix ] + s[4]*u[iz, iy-1,ix ] + s[5]*u[iz,iy,ix] + s[6]*u[iz, iy+1,ix ] + s[7]*u[iz, iy+2,ix ] + s[8]*u[iz, iy+3,ix ] + s[9]*u[iz, iy+4,ix ]
end
end
end
nothing
end
io = open("does_not_vectorize.txt", "w")
code_llvm(io, does_not_vectorize!, (Array{Float32,3}, Array{Float32,3}))
close(io)
nz,ny,nx = 512,512,512
u = rand(Float32,nz,ny,nx)
w = zeros(Float32,nz,ny,nx)
using BenchmarkTools
@btime does_vectorize!(w,u)
@btime does_not_vectorize!(w,u)
and the output (similar results are obtained on Julia 1.1.1 and 1.3.0-alpha) from the @btime
macro:
217.150 ms (1 allocation: 48 bytes)
1.304 s (1 allocation: 48 bytes)
I put the output from code_llvm
into a gist[1]. The point is that does_vectorize.txt
contains the expected vector.body:
section whereas does_not_vectorize.txt
does not. Can anyone reproduce this? Should I file a GitHub issue?
Thanks!
Sam
[1] https://gist.github.com/samtkaplan/0565ef0bc6d5007ed88635270950aaee