Hey all,
I’m stuck on understanding when to realize that the code will benefit from parallel processing, when not; when to take it on GPUs when not.
I have a code that runs as fast as it can (that’s what I think, could be wrong too) since it’s mostly in-place operations and I think it can benefit from parallelizations, though using Threads.@threads
doesn’t speed it up, and in fact using GPUs is also not a good option in this case.
I attach the snippet below, and I’d appreciate your suggestions, though a little on how to learn about them would be very much helpful.
function pseudo_G!(G_per_rec, rickshift, pa, ir)
for ix in 1:pa.nx, iy in 1:pa.ny, iz in 1:pa.nz, iT in pa.nT
Gg= view(G_per_rec, :,1,iz,iy,ix,iT);
circshift!(Gg, pa.rick0, pa.Tshift[iT]+ pa.shift[iz,iy,ix,ir])
end
end
function data_per_rec!(dd, m, G_per_rec, rickshift, pa, ir)
pseudo_G!(reshape(G_per_rec, (pa.nt,1,pa.nz, pa.ny, pa.nx, pa.nT)), rickshift, pa, ir)
mul!(dd, G_per_rec, m);
end
function get_data!(d, m, G_per_rec, rickshift, pa)
for ir in 1:pa.nr
# dd= view(d, :, ir:ir)
data_per_rec!(view(d, :, ir:ir), m, G_per_rec, rickshift, pa, ir)
end
end
dtr= zeros(nt,nr)
mtr= zeros(nz*ny*nx*nT,1);
get_data!(dtr, mtr, G_per_rec, rickshift, pa);
Using @time get_data!(...)
, I get
48.761478 seconds (37.38 k allocations: 2.049 MiB, 0.03% compilation time)
. I can do away with memory allocations as well which happen because of using reshape(G_per_rec, ...)
while passing it to pseudo_G!(...)
in data_per_rec!
function, but that doesn’t really speed things up.
where pa
is (to avoid bottlenecks because of global variables)
mutable struct Params
nt::Int64
nr::Int64
nx::Int64
ny::Int64
nz::Int64
nT::Int64
rick0::Vector{Float16}
shift::Array{Int64,4}
Tshift::Vector{Int64}
end
pa= Params(nt, nr, nx, ny, nz, nT, rick0, shift, Tshift);
Thanks in advance!