Hello all, I am new to Julia and have been making progress with implementing its performance utilities. Generally, I am able to get some results or an understanding of how things are going, but not in this case.
The following code:
@btime circshift!(rickshift, rick0, shift[iz,iy,ix,it])
@btime mul_alt2!(pd, m[iz,iy,ix, :], rickshift)
@btime Threads.@threads for i in 1:nT
Dd= view(dd, :, i)
Pd= view(pd, :, i)
circshift!(Dd, Pd, Tgrid[i])
end
returns (I am using 4 threads throughout)
200.706 ns (2 allocations: 48 bytes)
8.112 μs (2 allocations: 928 bytes)
29.826 μs (520 allocations: 22.14 KiB)
EDIT: I have defined rick0 as a sparse vector earlier in the code, and rickshift= similar(rick0)
, shift
is a 4-dimensional array of size nz,ny,nx,nT
containing Float32
values, dd
is defined as dd= view(d, :, ir, :)
as is also in the next snippet, pd= zeros(nt, nT)
, Tshift= -49:50
which implies that the following function
#size(d) = nt, nr, nT
#size(m) = nz,ny,nx,nT
function Gnew!(d, m)
for ir in 1:Nr
for ix in 1:nx, iy in 1:ny, iz in 1:nz
circshift!(rickshift, rick0, shift[iz,iy,ix,ir])
mul_alt2!(pd, m[iz,iy,ix, :], rickshift)
Threads.@threads for i in 1:nT
Dd= view(dd, :, i)
Pd= view(pd, :, i)
circshift!(Dd, Pd, Tshift[i])
end
end
end
end
would run in some 20 minutes for nx=ny=nz=50, nT= 100, Nr= 250 (39e-6 x50x50x50x250/60= 20.468)
but when I run it using the following:
D= zeros(nt, nr, nT)
m_init= randn(nx,ny,nz,nT)
t1= time()
Gnew!(D, m_init)
time()- t1
the output is 4375 secs
And mul_alt2!(...)
is a function, to give matrix output of 2 vectors, one of them being sparse vector, which are not defined as matrices, defined as
function mul_alt2!(C::Matrix, X::Vector, A::SparseVector)
@inbounds for i in A.nzind
cc=view(C,i,:)
BLAS.axpy!(A[i], X, cc)
end
end
Since the A and X in the above function definition are vectors and not matrices, I did not find a relevant BLAS
function or any relevant efficient function that does not require me to reshape the vectors into arrays, since that step again is not efficient.
I hope I have posted enough information. In case I haven’t, please let me know. Also if you have some suggestions, do share them. Thanks for your time!