I am trying to improve the performance of script that I am converting from python. To my knowledge, I have: written a kernel function, preallocated my output and accessed memory in order.
While I have improved the scripts performance, the overall it is still noticeably* slower (by 2x ) than the python script.
Here is the MWE,
using ImageFiltering, OffsetArrays,StatsBase
function PA13!(A, Aout, winarray, ndays, latsize, lonsize)
for d in 1:ndays
for lat in 1:latsize, lon in 1:lonsize
t = A[lon,lat,winarray]
if all(.!ismissing.(t))
Aout[lon,lat,d] = percentile(skipmissing(t), 90)
else
Aout[lon,lat,d] = missing
end
end
#imfilter used to roll window. similir to np.roll
winarray = imfilter(winarray, OffsetArray([1], -1 - d), "circular")
winarray = convert(BitArray, winarray)
end
Aout
end
function PA13(A, window=15)
ndays = 365
latsize = size(A,2)
lonsize = size(A,1)
Aout = Array{eltype(A)}(undef, lonsize, latsize, ndays)
nyrs = 1
#
winarray = BitArray(undef, ndays)
winarray .= 0
winarray[[1:window ÷ 2 + 1; (ndays + 1 - window ÷2):end]] .= true
winarray = repeat(winarray, nyrs)
PA13!(A, Aout, winarray, ndays, latsize, lonsize)
Aout
end
const x = rand(223,152,365);
PA13(x);
with the corresponding benchmarking results
BenchmarkTools.Trial:
memory estimate: 8.39 GiB
allocs estimate: 111350557
--------------
minimum time: 12.593 s (7.96% GC)
median time: 12.593 s (7.96% GC)
mean time: 12.593 s (7.96% GC)
maximum time: 12.593 s (7.96% GC)
--------------
samples: 1
evals/sample: 1
and code_warntype results
Variables
#self#::Core.Compiler.Const(PA13, false)
A::Array{Float64,3}
Body::Array{Float64,3}
1 ─ %1 = (#self#)(A, 15)::Array{Float64,3}
└── return %1
Are there any gains to be made?