Element-wise additions behaviour

rpetit · January 30, 2020, 3:30pm

Hi ! I was wondering if someone could explain me the huge gap I noticed in time and memory cost between these two functions.

function fwddiff1(γ::Array{T, 2}) where T<:Real
    res = zero(γ)

    for i=1:size(γ)[1]-1
        res[i, :] = γ[i+1, :] .- γ[i, :]
    end

    res[end, :] = γ[1, :] .- γ[end, :]
    res
end

function fwddiff2(γ::Vector{Complex{T}}) where T<:Real
    [[γ[i+1] - γ[i] for i=1:length(γ)-1]; γ[1] - γ[end]]
end

It seems the second one is an order of magnitude faster than the first one, and the same situation holds about memory footprint. I thought vectorizing things was not a good idea in Julia, which is why I’m confused about this ! What puzzles me even more is that in the simpler following example, there is almost no difference between the two functions …

function f1(x::Vector{T}) where T<:Real
    res = zero(x)

    for i=2:length(x)
        res[i] = x[i-1] + x[i]
    end

    res
end

function f2(x::Vector{T}) where T<:Real
    [x[i-1] + x[i] for i=2:length(x)]
end

Thanks in advance !

improbable22 · January 30, 2020, 3:40pm

But fwddiff1 and fwddiff2 do different things, to different inputs. Are we to infer that the real and complex parts of γ2 are the two columns of γ1?

f1 and f2 are, as you say, more or less identical, up to a zero at the start.

Bernard_GODARD · January 30, 2020, 3:42pm

This is creating a new vector at each iteration of the loop.
You want to write:

res[i, :] .= γ[i+1, :] .- γ[i, :]

Notice the dot in front of equal.

rpetit · January 30, 2020, 3:46pm

You’re right, sorry about not giving more details about how I’m using them. Actually, I wanted to figure out if representing a list of N points in the plane by a Nx2 array of floats was more efficient than using a length N vector of complex numbers. So I was actually only interested in the behaviour of fwddiff where the input is of size Nx2 !

rpetit · January 30, 2020, 3:46pm

Thank you very much, I was not aware of that !!

improbable22 · January 30, 2020, 3:51pm

OK! Well here’s a runable comparison… I feel like something must be wrong with fwddiff2 here though.

julia> function fwddiff3(γ::Array{T, 2}) where T<:Real
           res = zero(γ)
       
           @views for i=1:size(γ)[1]-1
               @inbounds res[i, :] .= γ[i+1, :] .- γ[i, :]
           end
       
           @views res[end, :] .= γ[1, :] .- γ[end, :]
           res
       end
fwddiff3 (generic function with 1 method)

julia> function fwddiff4(γ::Array{T, 2}) where T<:Real
           res = zero(γ)
           @assert size(γ,2) == 2
       
           for i=1:size(γ,1)-1, c=1:2
               @inbounds res[i, c] = γ[i+1, c] - γ[i, c]
           end
       
           for c=1:2
               res[end, c] = γ[1, c] - γ[end, c]
           end
           res
       end
fwddiff4 (generic function with 1 method)

julia> rr2 = rand(100,2); cc = rand(ComplexF64,100);

julia> @btime fwddiff1($rr2);
  8.840 μs (301 allocations: 29.89 KiB)

julia> @btime fwddiff2($cc);
  1.231 μs (22 allocations: 4.28 KiB)

julia> @btime fwddiff3($rr2); # with .=, view, @inbounds
  3.134 μs (301 allocations: 15.83 KiB)

julia> @btime fwddiff4($rr2); # completely scalar
  147.757 ns (1 allocation: 1.77 KiB)

rpetit · January 30, 2020, 3:53pm

Thanks, that makes it perfectly clear !

stevengj · January 30, 2020, 3:57pm

Even if you do that, the slice expressions on the right-hand side allocate copies unless you wrap the expression (or just the whole function) in @views. See Performance Tips · The Julia Language

Bernard_GODARD · January 30, 2020, 4:08pm

Right I was actually trying this but I could not get @view to work. It is actually working with a single array. @views is what I was looking for, thanks. Still I have some allocations left in the code below.

julia> function f(res,γ)
          
                  for i=1:size(γ,1)-1
                       @views res[i, :] .= ( γ[i+1, :] .- γ[i, :])
                  end
              end
f (generic function with 1 method)
julia> u=Array{Float64,2}(undef,50,50);
julia> v=zero(u);
julia> @benchmark f($v,$u)
BenchmarkTools.Trial: 
  memory estimate:  6.89 KiB
  allocs estimate:  147
  --------------
  minimum time:     4.838 μs (0.00% GC)
  median time:      4.949 μs (0.00% GC)
  mean time:        6.342 μs (13.27% GC)
  maximum time:     5.061 ms (99.82% GC)
  --------------
  samples:          10000
  evals/sample:     7

Edit: probably the view or the iterator allocating (changing the iterator with a while loop does not change the allocation)

DNF · January 30, 2020, 5:04pm

But you have this backwards! It is fwddiff1 which is ‘vectorized’ and fwddiff2 which is not.

At every iteration, fwddiff1 extracts two rows from γ, and allocates them into temporary vectors, which are then subtracted. Adding a dot in .= doesn’t really help here. Using views help with allocations, but you are then iterating along rows which is inefficient, since Julia arrays are column-major.

Meanwhile, fwddiff2 efficiently iterates over γ down along the columns, in the optimal access pattern, doing scalar operations. This is fast as well as low on allocations.

BTW: ‘vectorized’ operations are fine in Julia, but you want to do them in a way that avoids unnecessary allocations, and ensure that operations does sequential access along the fastest dimension.

Topic		Replies	Views
Creating the same matrix in Julia as in Matlab takes longer time New to Julia	28	2188	June 28, 2017
Trying to understand how to write good code (example) New to Julia	34	3008	February 16, 2017
Migrating code from Matlab: Accessing arrays without allocating Performance	20	1947	November 19, 2018
Common allocation mistakes Performance memory-allocation	47	7175	August 21, 2023
Counterintuitive array update allocations Performance	11	572	July 17, 2021

Element-wise additions behaviour

Related topics