Hi ! I was wondering if someone could explain me the huge gap I noticed in time and memory cost between these two functions.
function fwddiff1(γ::Array{T, 2}) where T<:Real
res = zero(γ)
for i=1:size(γ)[1]-1
res[i, :] = γ[i+1, :] .- γ[i, :]
end
res[end, :] = γ[1, :] .- γ[end, :]
res
end
function fwddiff2(γ::Vector{Complex{T}}) where T<:Real
[[γ[i+1] - γ[i] for i=1:length(γ)-1]; γ[1] - γ[end]]
end
It seems the second one is an order of magnitude faster than the first one, and the same situation holds about memory footprint. I thought vectorizing things was not a good idea in Julia, which is why I’m confused about this ! What puzzles me even more is that in the simpler following example, there is almost no difference between the two functions …
function f1(x::Vector{T}) where T<:Real
res = zero(x)
for i=2:length(x)
res[i] = x[i-1] + x[i]
end
res
end
function f2(x::Vector{T}) where T<:Real
[x[i-1] + x[i] for i=2:length(x)]
end
You’re right, sorry about not giving more details about how I’m using them. Actually, I wanted to figure out if representing a list of N points in the plane by a Nx2 array of floats was more efficient than using a length N vector of complex numbers. So I was actually only interested in the behaviour of fwddiff where the input is of size Nx2 !
Even if you do that, the slice expressions on the right-hand side allocate copies unless you wrap the expression (or just the whole function) in @views. See Performance Tips · The Julia Language
Right I was actually trying this but I could not get @view to work. It is actually working with a single array. @views is what I was looking for, thanks. Still I have some allocations left in the code below.
julia> function f(res,γ)
for i=1:size(γ,1)-1
@views res[i, :] .= ( γ[i+1, :] .- γ[i, :])
end
end
f (generic function with 1 method)
julia> u=Array{Float64,2}(undef,50,50);
julia> v=zero(u);
julia> @benchmark f($v,$u)
BenchmarkTools.Trial:
memory estimate: 6.89 KiB
allocs estimate: 147
--------------
minimum time: 4.838 μs (0.00% GC)
median time: 4.949 μs (0.00% GC)
mean time: 6.342 μs (13.27% GC)
maximum time: 5.061 ms (99.82% GC)
--------------
samples: 10000
evals/sample: 7
Edit: probably the view or the iterator allocating (changing the iterator with a while loop does not change the allocation)
But you have this backwards! It is fwddiff1 which is ‘vectorized’ and fwddiff2 which is not.
At every iteration, fwddiff1 extracts two rows from γ, and allocates them into temporary vectors, which are then subtracted. Adding a dot in .= doesn’t really help here. Using views help with allocations, but you are then iterating along rows which is inefficient, since Julia arrays are column-major.
Meanwhile, fwddiff2 efficiently iterates over γ down along the columns, in the optimal access pattern, doing scalar operations. This is fast as well as low on allocations.
BTW: ‘vectorized’ operations are fine in Julia, but you want to do them in a way that avoids unnecessary allocations, and ensure that operations does sequential access along the fastest dimension.