Another weird operation to be optimized:
function rev_cumsum_exp!(A,B,source)
# This is equivalent to :
# A .= exp(source)
# B .= 1 ./ reverse(cumsum(reverse(A))))
s = zero(eltype(A))
for j in length(source):-1:1
A[j] = exp(source[j])
s += A[j]
B[j] = inv(s)
end
return nothing
end
# Sample data:
using BenchmarkTools
N=10_000
source, A, B = randn(N), zeros(N), zeros(N)
@benchmark rev_cumsum_exp!(A,B,source)
This yields:
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min β¦ max): 85.600 ΞΌs β¦ 737.400 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 96.600 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 107.616 ΞΌs Β± 32.025 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
β
ββ β ββ ββ β
ββ ββ β β
βββββββββββββββ
ββββββββ
β
ββ
βββ
β
ββ
ββ
ββ
ββ
ββ
ββββββββββββββ
ββ
βββββ β
85.6 ΞΌs Histogram: log(frequency) by time 211 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
But If i try to profile it:
N=100_000_000
source, A, B = randn(N), zeros(N), zeros(N)
@profview rev_cumsum_exp!(A,B,source)
I get:
Is there still something to be done or am I doomed ? I need both the A and B outputs to be produced..