 Hi ! I was wondering if someone could explain me the huge gap I noticed in time and memory cost between these two functions.

``````function fwddiff1(γ::Array{T, 2}) where T<:Real
res = zero(γ)

for i=1:size(γ)-1
res[i, :] = γ[i+1, :] .- γ[i, :]
end

res[end, :] = γ[1, :] .- γ[end, :]
res
end

function fwddiff2(γ::Vector{Complex{T}}) where T<:Real
[[γ[i+1] - γ[i] for i=1:length(γ)-1]; γ - γ[end]]
end
``````

It seems the second one is an order of magnitude faster than the first one, and the same situation holds about memory footprint. I thought vectorizing things was not a good idea in Julia, which is why I’m confused about this ! What puzzles me even more is that in the simpler following example, there is almost no difference between the two functions …

``````function f1(x::Vector{T}) where T<:Real
res = zero(x)

for i=2:length(x)
res[i] = x[i-1] + x[i]
end

res
end

function f2(x::Vector{T}) where T<:Real
[x[i-1] + x[i] for i=2:length(x)]
end
``````

But `fwddiff1` and `fwddiff2` do different things, to different inputs. Are we to infer that the real and complex parts of `γ2` are the two columns of `γ1`?

`f1` and `f2` are, as you say, more or less identical, up to a zero at the start.

This is creating a new vector at each iteration of the loop.
You want to write:

``````res[i, :] .= γ[i+1, :] .- γ[i, :]
``````

Notice the dot in front of equal.

You’re right, sorry about not giving more details about how I’m using them. Actually, I wanted to figure out if representing a list of N points in the plane by a Nx2 array of floats was more efficient than using a length N vector of complex numbers. So I was actually only interested in the behaviour of fwddiff where the input is of size Nx2 !

Thank you very much, I was not aware of that !!

OK! Well here’s a runable comparison… I feel like something must be wrong with `fwddiff2` here though.

``````julia> function fwddiff3(γ::Array{T, 2}) where T<:Real
res = zero(γ)

@views for i=1:size(γ)-1
@inbounds res[i, :] .= γ[i+1, :] .- γ[i, :]
end

@views res[end, :] .= γ[1, :] .- γ[end, :]
res
end
fwddiff3 (generic function with 1 method)

julia> function fwddiff4(γ::Array{T, 2}) where T<:Real
res = zero(γ)
@assert size(γ,2) == 2

for i=1:size(γ,1)-1, c=1:2
@inbounds res[i, c] = γ[i+1, c] - γ[i, c]
end

for c=1:2
res[end, c] = γ[1, c] - γ[end, c]
end
res
end
fwddiff4 (generic function with 1 method)

julia> rr2 = rand(100,2); cc = rand(ComplexF64,100);

julia> @btime fwddiff1(\$rr2);
8.840 μs (301 allocations: 29.89 KiB)

julia> @btime fwddiff2(\$cc);
1.231 μs (22 allocations: 4.28 KiB)

julia> @btime fwddiff3(\$rr2); # with .=, view, @inbounds
3.134 μs (301 allocations: 15.83 KiB)

julia> @btime fwddiff4(\$rr2); # completely scalar
147.757 ns (1 allocation: 1.77 KiB)
``````

Thanks, that makes it perfectly clear !

Even if you do that, the slice expressions on the right-hand side allocate copies unless you wrap the expression (or just the whole function) in `@views`. See https://docs.julialang.org/en/v1/manual/performance-tips/#Consider-using-views-for-slices-1

1 Like

Right I was actually trying this but I could not get @view to work. It is actually working with a single array. @views is what I was looking for, thanks. Still I have some allocations left in the code below.

``````julia> function f(res,γ)

for i=1:size(γ,1)-1
@views res[i, :] .= ( γ[i+1, :] .- γ[i, :])
end
end
f (generic function with 1 method)
julia> u=Array{Float64,2}(undef,50,50);
julia> v=zero(u);
julia> @benchmark f(\$v,\$u)
BenchmarkTools.Trial:
memory estimate:  6.89 KiB
allocs estimate:  147
--------------
minimum time:     4.838 μs (0.00% GC)
median time:      4.949 μs (0.00% GC)
mean time:        6.342 μs (13.27% GC)
maximum time:     5.061 ms (99.82% GC)
--------------
samples:          10000
evals/sample:     7
``````

Edit: probably the view or the iterator allocating (changing the iterator with a while loop does not change the allocation)

But you have this backwards! It is `fwddiff1` which is ‘vectorized’ and `fwddiff2` which is not.

At every iteration, `fwddiff1` extracts two rows from `γ`, and allocates them into temporary vectors, which are then subtracted. Adding a dot in `.=` doesn’t really help here. Using `view`s help with allocations, but you are then iterating along rows which is inefficient, since Julia arrays are column-major.

Meanwhile, `fwddiff2` efficiently iterates over `γ` down along the columns, in the optimal access pattern, doing scalar operations. This is fast as well as low on allocations.

BTW: ‘vectorized’ operations are fine in Julia, but you want to do them in a way that avoids unnecessary allocations, and ensure that operations does sequential access along the fastest dimension.

1 Like