Element-wise additions behaviour

Hi ! I was wondering if someone could explain me the huge gap I noticed in time and memory cost between these two functions.

function fwddiff1(γ::Array{T, 2}) where T<:Real
    res = zero(γ)

    for i=1:size(γ)[1]-1
        res[i, :] = γ[i+1, :] .- γ[i, :]
    end

    res[end, :] = γ[1, :] .- γ[end, :]
    res
end

function fwddiff2(γ::Vector{Complex{T}}) where T<:Real
    [[γ[i+1] - γ[i] for i=1:length(γ)-1]; γ[1] - γ[end]]
end

It seems the second one is an order of magnitude faster than the first one, and the same situation holds about memory footprint. I thought vectorizing things was not a good idea in Julia, which is why I’m confused about this ! What puzzles me even more is that in the simpler following example, there is almost no difference between the two functions …

function f1(x::Vector{T}) where T<:Real
    res = zero(x)

    for i=2:length(x)
        res[i] = x[i-1] + x[i]
    end

    res
end

function f2(x::Vector{T}) where T<:Real
    [x[i-1] + x[i] for i=2:length(x)]
end

Thanks in advance !

But fwddiff1 and fwddiff2 do different things, to different inputs. Are we to infer that the real and complex parts of γ2 are the two columns of γ1?

f1 and f2 are, as you say, more or less identical, up to a zero at the start.

This is creating a new vector at each iteration of the loop.
You want to write:

res[i, :] .= γ[i+1, :] .- γ[i, :]

Notice the dot in front of equal.

You’re right, sorry about not giving more details about how I’m using them. Actually, I wanted to figure out if representing a list of N points in the plane by a Nx2 array of floats was more efficient than using a length N vector of complex numbers. So I was actually only interested in the behaviour of fwddiff where the input is of size Nx2 !

Thank you very much, I was not aware of that !!

OK! Well here’s a runable comparison… I feel like something must be wrong with fwddiff2 here though.

julia> function fwddiff3(γ::Array{T, 2}) where T<:Real
           res = zero(γ)
       
           @views for i=1:size(γ)[1]-1
               @inbounds res[i, :] .= γ[i+1, :] .- γ[i, :]
           end
       
           @views res[end, :] .= γ[1, :] .- γ[end, :]
           res
       end
fwddiff3 (generic function with 1 method)

julia> function fwddiff4(γ::Array{T, 2}) where T<:Real
           res = zero(γ)
           @assert size(γ,2) == 2
       
           for i=1:size(γ,1)-1, c=1:2
               @inbounds res[i, c] = γ[i+1, c] - γ[i, c]
           end
       
           for c=1:2
               res[end, c] = γ[1, c] - γ[end, c]
           end
           res
       end
fwddiff4 (generic function with 1 method)

julia> rr2 = rand(100,2); cc = rand(ComplexF64,100);

julia> @btime fwddiff1($rr2);
  8.840 μs (301 allocations: 29.89 KiB)

julia> @btime fwddiff2($cc);
  1.231 μs (22 allocations: 4.28 KiB)

julia> @btime fwddiff3($rr2); # with .=, view, @inbounds
  3.134 μs (301 allocations: 15.83 KiB)

julia> @btime fwddiff4($rr2); # completely scalar
  147.757 ns (1 allocation: 1.77 KiB)

Thanks, that makes it perfectly clear !

Even if you do that, the slice expressions on the right-hand side allocate copies unless you wrap the expression (or just the whole function) in @views. See Performance Tips · The Julia Language

1 Like

Right I was actually trying this but I could not get @view to work. It is actually working with a single array. @views is what I was looking for, thanks. Still I have some allocations left in the code below.

julia> function f(res,γ)
          
                  for i=1:size(γ,1)-1
                       @views res[i, :] .= ( γ[i+1, :] .- γ[i, :])
                  end
              end
f (generic function with 1 method)
julia> u=Array{Float64,2}(undef,50,50);
julia> v=zero(u);
julia> @benchmark f($v,$u)
BenchmarkTools.Trial: 
  memory estimate:  6.89 KiB
  allocs estimate:  147
  --------------
  minimum time:     4.838 μs (0.00% GC)
  median time:      4.949 μs (0.00% GC)
  mean time:        6.342 μs (13.27% GC)
  maximum time:     5.061 ms (99.82% GC)
  --------------
  samples:          10000
  evals/sample:     7

Edit: probably the view or the iterator allocating (changing the iterator with a while loop does not change the allocation)

But you have this backwards! It is fwddiff1 which is ‘vectorized’ and fwddiff2 which is not.

At every iteration, fwddiff1 extracts two rows from γ, and allocates them into temporary vectors, which are then subtracted. Adding a dot in .= doesn’t really help here. Using views help with allocations, but you are then iterating along rows which is inefficient, since Julia arrays are column-major.

Meanwhile, fwddiff2 efficiently iterates over γ down along the columns, in the optimal access pattern, doing scalar operations. This is fast as well as low on allocations.

BTW: ‘vectorized’ operations are fine in Julia, but you want to do them in a way that avoids unnecessary allocations, and ensure that operations does sequential access along the fastest dimension.

1 Like